





ON THE ASYMPTOTIC BEHAVIOR OF DECISION PROCEDURES'” 


By Jack LADERMAN® 
Columbia University 


0. Summary and introduction. In this paper, the asymptotic behavior of de- 
cision procedures will be studied for a particular class of multiple decision prob- 
lems. The study will throw some light on the desirability of the minimax decision 
procedure when the number of observations is large, and it will be seen that 
decision procedures frequently exist which are superior to the minimax decision 
procedure for large samples. The fact that the minimax decision procedure may 
be desirable in certain problems for small samples but undesirable for large 
samples was revealed by Hodges and Lehmann [1] in connection with estima- 
tion problems. Robbins [2] suggested the term ‘“‘asymptotically subminimax” 
for the type of superior procedures which may then exist. A definition of this 
term which will be useful for an investigation of the asymptotic behavior of 
decision procedures, will be given in Section 1. A major part of this paper will 
be concerned with certain sequences of decision procedures called asymptotically 
admissible which have desirable properties similar to those of admissible de- 
cision procedures for the case of some fixed sample size. These asymptotically 
admissible decision procedures include a subclass of the asymptotically submini- 
max procedures, and the sequences of minimax procedures for those problems 
for which asymptotically subminimax procedures do not exist. 

The problems to be considered are those in which a random variable, X, is 
known to have a distribution function belonging to the distribution space, 
Q = {F(x)},7 = 1, 2,--- , k, and it is desired to select the true distribution 
function based on a sample of n independent observations of X. It will be as- 
sumed that all F,(x) are absolutely continuous distribution functions having 
density functions, f;(z), and that for every constant K, the set of points for which 
fx)/f <x) = K (t ¥ j) is a set of probability measure zero under every F and 
all possible 7 and j. A simple loss function, W(F; , d;), where d; is the decision to 
select F; , will be used with W(F; , d;) = 1 if an incorrect decision is made (i.e. 
it ~ j), and W(F,;, d;) = 0 if a correct decision is made (i.e. i = j). For such ,a 
loss function, the expected loss is simply the probability of making an incorrect 
decision. 

Section 1 will be concerned with asymptotically minimax sequences of de- 
cision procedures and Section 2 will be concerned with asymptotic admissibility. 
It will be seen that the asymptotic behavior of the minimax decision procedure 
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depends on the limits of the components in the sequence of least favorable a 
priori distributions. Theorem 2.2 gives a sufficient condition, in terms of these 
limit values, for the minimax procedure to be asymptotically admissible. 

In Section 3, a detailed study will be made of the class of problems where 
Q consists of k univariate normal distributions having the same variance but 
different means. Let the means be denoted by 6; (i = 1, 2, --- , k) with  < 
6. << +--+ < &,and let min,(0;,, — 6;) = y. Then Theorems 3.1 and 3.6 will show 
that the minimax procedure is asymptotically admissible when the means can 
be put into sets, each set containing the same number, 7 = 2, of consecutive 
means, with a difference of y between any two consecutive means of a set, and a 
difference greater than ~ between any two means not belonging to the same set. 
Theorems 3.2 and 3.3 will show that in all other cases the minimax procedure is 
asymptotically inadmissible, and asymptotically subminimax procedures will be 
constructed for all these cases. Although a complete study of the asymptotic 
admissibility of asymptotically subminimax procedures will not be made in this 
paper, Theorem 3.7 will show that a certain asymptotically subminimax pro- 
cedure is asymptotically admissible for all the cases covered by Theorem 3.3 
and for some of the cases covered by Theorem 3.2. On the other hand, for those 
cases covered by Theorem 3.2 with an Q consisting of only 3 means, it will be 
shown that every asymptotically subminimax procedure is asymptotically inad- 
missible. 


1. Asymptotically minimax decision procedures. Consider an © consisting of 
only two distribution functions, F(z) and F;(x), having density functions f(x) 
and f.(x) respectively, and let the observed values be 2; , 22, --- , 2, . Using a 
simple loss function and denoting the least favorable a priori distribution by 


A A ‘ae ee Aa ° 
gi” and 1 — gi”, the minimax decision procedure, 4, , is as follows: 
Select F(x) if 


a? Tite) > 1 - 6) TL ies, 
and select F(x) if 
a? TL ile) < 0 - 6) TT ie. 
Throughout this paper we shall ignore the possibility of equality in the above 


expressions because by the previous assumption the probability of such an event 
is zero. The risks associated with the minimax procedure are 


ri(8,) = Pr ES Tl fz,) < (1 — gf”) TT foe) /fu) is true d. | 


j=l 


(1.1) . 
r2(8,) = Pr | a” IT ft) >(1- a”) IT fo(a;)/fo(x) is true d. t, 


For the minimax procedure, the two risks are equal, and their common value 
will be denoted by r(6,). 
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Let us suppose that the sequence @{”, gi”, --- , gi”, --- is a null sequence, 
and consider the sequence of Bayes procedures, {6, }, corresponding to a sequence 
of a priori distributions, {g{”, 1 — g{"}, where the g{” 
conditions: 

(A) gi” = gi” except for a finite number of values of n. 

(B) The sequence, {g{”}, is a null sequence. 
It then follows from (1.1) and (A) that there exists an integer, N, such that if 
n 2 N, the risks, r;(6,), satisfy 


(1.2) ri(5n) < (bn) S ro(b,). 


satisfy the following two 


Since a Bayes procedure minimizes the average risk for the corresponding a priori 
distribution, we have 


(1.3) gi” ri(5n) + (1 — gS”)ro(5,) < r(é,). 


By dividing (1.3) by r(6,) and using (1.2) and (B), we obtain 


lB) _ 
(1.4) lim +(6,) = | 


Thus, {6,} is a sequence of Bayes procedures with the ratio of its maximum 
risk to the common minimax risk approaching 1 as n approaches infinity. Clearly, 
for large n, {6,}, can not be much worse than the minimax procedure when F,(z) 
is the true distribution, but when F(z) is the true distribution {6,} may con- 
ceivably be much better than the minimax procedure. For this reason it seemed 
desirable to study the class of decision procedures having such properties in the 
more general case when © contains k distributions. 

We start out with the following definition: 

A sequence of decision procedures, {6,}, where n corresponds to the number of 
observations, will be said to be asymptotically minimax if 
(1.5) lim “aT ri(Bn) 1 


n-—-2o 


r(6,) 


where r;(6,) denotes the risk associated with 6, when F(x) is the true distribu- 
tion function, and r(5n) denotes the minimax risk. 

In Section 3, examples will be given for which asymptotically minimax de- 
cision procedures exist with the ratio of one of its risks to the common minimax 
risk approaching zero! For sufficiently large n, such a procedure would be more 
desirable than the minimax procedure for most problems arising in practice. 

A lemma will now be given which will be used in the proof of Theorem 1.1. 

Lemma 1.1. If 6 is a Bayes procedure relative to an a priori distribution, g, and 
8 is the minimax procedure, then 


(1.6) min r:(5) < r(6) < max r,(6). 


Proor. From the definition of a minimax procedure, we have that r(6) < 
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max; r;(6). To prove the other inequality assume min, r;(6) > r(8); it then follows 
that 


k k 
(1.7) 2d giti(6) > a gi r;(8). 


Since the Bayes procedure, 5, minimizes the average risk when g is the a priori 
distribution, (1.7) is impossible. This contradiction completes the proof. 

THEOREM 1.1. A sufficient condition for a sequence of Bayes procedures, {5,}, 
to be asymptotically minimax is 


° i(6,) es 
(1.8) lim 7 = or alli,j7 = 1,2,--- ,k. 
no r;(5n) f J 


Proor. Equation (1.8) implies 


max 7;(5n) 


1.9 Be ee 
om n—e min 74(5q) 


From Lemma 1.1 we have for all n 


max r(6,) max r¢(5n) 
(1.10) —__—- s ——_... 
r(3,) min r(x) 


Hence 


max r;(5,) 
(1.11) lim ——__—— = 1, 
no (6) 


which completes the proof. 


THEOREM 1.2. A necessary condition for a sequence of Bayes procedures, {5,}, 
corresponding to the a priori distributions, {g}, to be asymptotically minimaz is 


- Taldn) _ 
(1.12) lim a 


for all a and B belonging to T, where T is the set of integers, j, for which lim inf, 
95” > 0. i 
Proor. Since the minimax procedure, 6, , is the Bayes procedure curresponding 


to 9°”, we have 


k k 
(1.13) D Hr.) = Do GfrG,), 
i=l t=] 


and consequently 


. r (Sn) 
1.14) 4) ——™~ > 1. 
; 2H 5 
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Since for any j ¢ I’, lim inf,... 9," > 0, there exists a 6 > O and an integer, N, 
such that g§” > 6 for all n > N. Now for {6,} to be asymptotically minimax, for 
any given « > 0 there must exist an integer, 1/, such that for alli and alln > M, 
ri(Sn) 


(1.15) - 
r(6,) 


<1-+ 6. 


Hence forn > M, 


' atn) Ti(On 
(1.16) D HP + 8) + 9 
cmd r(dn) 


from which it follows that 


> 4, 


(1.17) 1g +04 9° Be 5 
r(bn) 
Thus, for n > max (N, M) 

ACE) 9; — 6 
r(é,) 9," 
But since 6 < 1, from (1.15) we have 
r (bn) 
r(bn) 


(1.18) >l—e. 


(1.19) <It+e. 
and therefore 


.  ribn 
(1.20) lim —> Piss 


: = |. 
oe r(b,) 


It then follows that if both a and 8 belong to TI, we have 


. Tada) 
1.21) in ——~ = 1 
( n--0 ra(bn) 


Theorem 1.2 is useful in proving that certain sequences of Bayes procedures 
can not be asymptotically minimax. For example, suppose 2 consists of k uni- 
variate normal distributions N (0; , 0”), i = 1,2, --- , k, with 0; = 0 + (i — 1)y, 
for some y > 0. In Section 3 it will be seen that for such a class of distribution 
functions, lim inf,.. 9” > 0 for all 7. Now consider the sequence of Bayes 
procedures, {5,}, where 5, corresponds to the a priori distribution g{” = 1/k 
for all 7. According to the decision procedure 6, one selects 


N(6;, 0°) if X <0+ ¥%, 
N(6;, 0°) if 0:—Iy¥<X<64+h 7 
N(6x , a) if X>@- dy, 


where X = }0?.. 2;/n. For k > 2 it is easily seen that r2(5,)/ri(5,) = 2 for all n. 
Therefore {6,} can not be asymptotically minimax. 
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The following theorem is of interest in connection with the construction of 
asymptotically minimax procedures. It includes the special case that if only one 
component of g‘” approaches zero, say 9" — 0, and none of the sequences of 
the other components has zero as a limit, then given any null sequence, {h‘”}, 
such that kh‘ > gS” from some n on, it is always possible to construct an 
asymptotically minimax sequence of Bayes procedures with h‘” as the ath 
component of g‘”. 

Turorem 1.3. If {g&"} is a null sequence and {g$”/g&} is bounded for all i € 1, 
where r is the set of integers, t, for which {g{”} is a null sequence, and if lim inf,.. 
g:” > O for alli gr, then an callictetiedlly minimax sequence of Bayes procedures 
can be found with any given sequence, {hk}, as the ath components of {g°”}, 
provided {h} is a null sequence and h*” = g%” except for at most a finite number 
of values of n. 

Proor. Let {4,} be the sequence of Bayes procedures relative to the sequence of 
a priori distributions, {g‘”}, given by 

(n) 
g = rs gs” forier 


Ya 


3 Dg” 
5 9 
(n 


gi ; Ja jer a(n) 


ay gi for i gr. 
z 9; 


ifr 
It can easily be verified that from some n on, each g™ defined by (1.22) is a 


probability distribution, and its ath component is hk‘. Also, for all j 
gs” /gs” = gs” 1g” for i et 
(1.23) 
I” /g3? < GP/G—s fori. 


Hence 


ri(Sn) S (bn) forier 
(1.24) 


ri(bn) = r(n) for 1 gz T. 


Therefore we need only prove that 


for all j zr. 
‘ a(n) ao . id ell 4 
Since g°"’ maximizes the average risk, we have 


(1.25) $ gS rilSn) < > §Sr(é,) = r(6,). 


i=] t=] 


Therefore 


(n) Ti(5n) 
ae Xo r(3,) ’ 





DECISION PROCEDURES 557 


Consider a particular j z 7. In view of (1.24), we can replace r(5n)/r(5n) im (1.26) 
by 1 for allt ¥ j, yielding 


(n) (n) 
~~ i ? q + i 
(1.27) 1 < 7 j(6n) < i , » g 


ter 


= r(é,) = g : gs” 


I 


But since g§”/g$” — 0 for all i ¢ +, we have the desired result. 


We now define a sequence of decision procedures, {6,}, to be asymptotically 
subminimaz, if {5,} is asymptotically minimax and satisfies 


(bn 
(1.28) lim sup Tin) <1 
4 r(bn) 


for at least one value of j. 


2. Asymptotically admissible decision procedures. In view of the existence 
of sequences of decision procedures which are asymptotically subminimax, it is 
desirable to set up some criterion for distinguishing the more desirable asymp- 
totically subminimax procedures. In this connection, a sequence of decision 
procedures, {5,}, will be said to be asymptotically admissible if there does not 
exist another sequence of decision procedures, {5,}, such that 


. ; ri(Sn) < 
(2.1) lim sup r(3,) = | 
for all 7, and the strict inequality holds for at least one value of 7. When such a 
{8,} exists, {5,} will be said to be asymptotically inadmissible. 

As a consequence of the above definition, if no asymptotically subminimax 
procedure exists, then the minimax procedure and all asymptotically minimax 
procedures are asymptotically admissible. On the other hand, when an asymp- 
totically subminimax procedure does exist, the minimax procedure is asymptot- 
ically inadmissible. An asymptotically subminimax procedure may or may not 
be asymptotically admissible. 

For large values of n it seems reasonable to require that the procedure selected 
should be asymptotically admissible when such a procedure exists. For this 
reason, it is of interest to know when the minimax procedure is asymptotically 
admissible. 

The following theorem will be helpful in determining the asymptotic behavior 
of sequences of decision procedures. 

TuroreM 2.1. If the sequence of Bayes decision procedures, {5,}, 1s asymptot- 
ically minimax and lim inf, .« 95” > 0, where gS” is the sth component of the 
least favorable a priori distribution for n observations, then 
(2.2) lim re(@s) = | 

oo r(6,) 


Proor. The integer s belongs to the set I defined in Theorem 1.2, and the proof 
of that theorem up to (1.20) proves (2.2). 
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From Theorem 2.1 it is seen that if lim inf,... g[". > 0, then it is impossible to 
construct an asymptotically subminimax procedure with the strict inequality 
holding for the sth risk. Hence a necessary condition for the existence of an 
asymptotically subminimax procedure is that lim inf,... 9;"" = 0 for at least one 
value of 7. Thus we have the following sufficient condition for the minimax 
procedure to be asymptotically admissible. 

TuroremM 2.2. Jf lim inf,...g{") > 0 for all i, then the minimax procedure is 
asymptotically admissible. 

In order to determine the asymptotic behavior of the minimax procedure, it 
becomes essential to know whether any of the gS” have zero as a lower limit in 
a given problem. If none of them has a zero limit, then the minimax procedure is 
asymptotically admissible and would appear to be a good decision procedure 
even for large values of n. When some of the g§” do have a zero limit, we would 
like to know which ones, and whether the minimax procedure is then asymptot- 
ically inadmissible. If this should be the case, we would then like to know if 
there is an asymptotically admissible asymptotically subminimax procedure, 
and how to find it. 

In the next section, a detailed study will be made of the limits of the g§” 
when © consists of k univariate normal distributions, all having the same vari- 
ance but different means. The questions raised in the above paragraph on the 
lower limits of the g{”, and on the asymptotic admissibility of the minimax 
procedure are completely resolved for the decision problems under consideration. 
Results are also obtained on the construction of asymptotically subminimax 
procedures and on their asymptotic admissibility. 


3. Asymptotic theory for normal distributions. Throughout this section it 
will be assumed that © consists of k univariate normal distributions, N(@;, o°), 
i= 1,2, --- ,k, all having the same known variance. Without loss in generality 
it will be assumed that the / means are labeled so that 6: < 0. <--- < &. 
Wald [3] showed that the minimax decision procedure for selecting the true mean 
for any fixed sample size, n, can then be obtained by determining k — 1 points, 
th < t, < +--+ < tk which divide the sample space of X = }°?_. z;/n, where z, 
is the ith observation, into k intervals in such a way that the minimax decision 
rule is given by selecting 6; if X lies in (t;_, , t;), where f = —« and& = +. 
These k — 1 points can be found from the system of equations: 

ty 


ply) dy = i 


/ ply) dy =X 


tees 
i] pealy) dy = 


k-2 


/ p(y) dy = d, 


tee 
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where 


ply) = a e 


The value of \ obtained from (3.1) is the probability of making a correct decision. 
If we let 


(3.2) 


then (3.1) can be written as 
GV nt — 6:)/o] = X 
Gin (t. — 62)/o} — GV n(ti — 6:)/o] = r 


Gl n(te-s — %~-1)/0] — Gl n(ti_2 — O%-1)/o] = 
1- G[-V/ nti + 6,)/o} = x. 


The least favorable a priori distribution, 9°”, can then be obtained from 


a(n) 

gi n 
3.4 log S_. = — 
we 8 zt ~ 30 


(6:51 — 0;)(2t; — Oi. — 4) 


and 


k 
(3.5) 2 Hh” = 1. 
t=] 

If the transformations, x; = z;/o and g; = 6,;/o are applied to the observations 
and distribution functions respectively, the minimax procedure would yield 
decision intervals having end points ¢; = t;/o, with the same minimax risk and 
least favorable a priori distribution as in the original problem. Thus, we need only 
investigate the behavior of the minimax decision procedure for 2 = {N(¢;, 1)}, 
and then interpret the results obtained in terms of the ¢’s and é’s into equivalent 
results in terms of the @’s and ?¢’s. For this reason we shall from now on consider 
the distribution space to be {N(¢; , 1)}, without any loss in generality. 

Since we shall be interested in the behavior of the ¢, with increasing n, we shall 
usually write eS” instead of é;. Then, for the distribution space {N(¢,, 1)}, 
(3.3) becomes 


G[V/n(éi” — ¢1)] = A 
G[/n(e” — e2)] — Gl nai” — ¢2)] = d 


Gl/n(éi2} — ve-s)] — Gl/ n(n — ers] = 
1 — Gln" — @))] = A. 
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and (3.4) becomes 
a(n) 

(3.7) log B= Inlvies — gi) (26S — vy — 91). 
Ji+1 

The solution of (3.6) for any given n can be accomplished by various iterative 
procedures. We shall not consider these methods in this paper because we shall 
be concerned only with the limits of the sequences of the é¢{"” and of the g$”. 

It will be convenient first to prove several lemmas from which the theorems 
concerning the asymptotic behavior of the minimax procedure for all possible 
sets of values for the ¢; will follow easily. 

Lemma 3.1. If {A,} and |B,,} are sequences of real numbers such that limy.«. An 
< lim,.e B, = 0, then 


(3.8) lim aay = (). 


Proor. The conclusion is obvious except when there exists an infinite subse- 
quence of B, for which +/nB, — — ~. In that case, by applying the first term 
of the asymptotic expansion for G(¢t) for negative values of ¢, namely 
4H 


| ih 
(3.9) G0 ~ Fe, 


to the left member of (3.8), it becomes 


(3.10) lim Bn exp {—4n(Ai, — B;)}. 


n—-o n 


But since 


0 < lim Bp 7 


nw fin 


lim (A, — Bi) > 0, 
the value of (3.10) is zero. 
Lemma 3.2. If we denote by y the value of min; (¢.4, — ¢;), then the é&" which 
determine the minimax decision intervals, satisfy 


(3.11) lim inf (g; — é{"}) = 3 


and 


(3.12) lim inf (¢{” — ¢,) = 3 


no 


and at least one of the above holds as an equality. 
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Proor. Let 7 be the smallest integer for which y = 94; — ¢;. Now consider 
the decision procedure determined by 
Gg = ¢i + fori Sj 
(3.13) : ‘ 
Ci = Pin — BY fort > j, 


where we select g; if X < a, 9 if 1 < X < c; for1 <i < k, and y if 
X > c-1. This is the Bayes procedure corresponding to the a priori distribu- 


tion gi”, gi”, ---, gh” given by 


(n) 


log os SN piss — ¢i) (2c, — Gin — i), i= 1,2 ~~. »k =] 
Ji+i 


(n) 
g =1. 
1=1 
For any n, the minimum risk is G(—+/n 7/2) and the maximum risk does not 
exceed 2G(—+/n y/2). By Lemma 1.1 the minimax risk must lie between these 
two values. Hence 


(3.15) — Gv nléi2i — ¢)] + Gin: - a) 


- G(—Vn v/2) 


If (3.11) or (3.12) were false, then, by Lemma 3.1, for a sufficiently large n the 
fraction in (3.15) would become greater than 2, contradicting (3.15). If neither 
(3.11) nor (3.12) held as an equality, the fraction would become smaller than 1, 
again contradicting (3.15). 

Lemma 3.3. The minimaz decision interval end points, ¢;", and the components 


a(n) 


of the least favorable a priori distributions, g;"’, satisfy 


ne a \\ 
(3.16) lim inf * pi = lim inf GV nlyi — é! 1 


ne gs” n>o Gin — “ital 


and (3.16) also holds if the lower limits are replaced by upper limits. 
Proor. For any given i and n, the é$”, g§”, g§7} satisfy 


a(n) 
(3.17) Piel exp {— 43n[(es” — 9)? — (8 — gigs)}}. 
gs 
Since the minimax risk approaches zero as n approaches infinity, the ¢{” 
eventually satisfy g; < ¢§” < gis. 
Now by using (3.9) the right member of (3.16) becomes 


must 


(3.18) lim inf exp {—4n[(y; — é&”)? — (48 — gig))} - Seana. 


a(n) 
n+ — = Ci 


Since both factors in (3.18) approach their lower limits through the same subse- 
quences, it can be written as 


a(n) 


(3.19) lim inf 2+". p, 


a(n) 
noo (J; 
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where 


(3.20) L = lim inf + 
nw~e 

Now if L = 1, then we have (3.16). If L < 1, then from (3.18) we see that the 
right member of (3.16) is zero, and by (3.17) the left member of (3.16) is also 
zero. If L > 1, then from (3.17) and (3.18) it follows that both members of 
(3.16) become infinite. The remainder of the theorem is proved in exactly the 
same manner by replacing lower limits by upper limits throughout the above. 

In several of the lemmas and theorems to follow, it will be convenient to refer 
to those ¢; which belong to a set of means of 2 satisfying Condition A given below. 
In stating the condition and in the remainder of this section except when other- 
wise stated, the value of min; (¢;4: — ¢;) will be denoted by y. 

Condition A. A set of means of 2 with consecutive subscripts, say ¢, , ¢.41, 
-++ | G44 Will be said to satisfy Condition A if ¢4.; — ¢: = y fori = s,s + 1, 

-,sti-l. 

Lemma 3.4. If ¢; belongs to a set of means satisfying Condition A, then 
(3.21) lim (é§” — y;) = 44 ssi<s+t 


n-~@ 


and 


(3.22) lim (y; — é§"}) = 4y 


noe 


Proor. Suppose that 
(3.23) lim sup (é&” — ga) > 47 


for some a satisfying s S a < s + t. Then from 
(3.24) (Ga41 — G5") + (62” — ve)= ¥ 
we get 


(3.25) lim inf (Gas: — é&”) < 4y. 


n~@2 


Since (3.25) contradicts Lemma 3.2, we have 


(3.26) lim sup (@&” — ga) S }y. 


nso 


By Lemma 3.2 we have also 


(3.27) lim inf (¢&” — g.)= }y, 
and (3.21) follows. Then from (3.21) and (3.24) we get (3.22). 
Lemma 3.5. If ¢; and ¢;4; belong to a set of means satisfying Condition A, then 


am 
(3.28) lim sup Gino: — 6") <4 
oe G[-V/nlé ne ¢i41)] 


’ 
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Gln: — 6”)] 
Nee OLVae = eal 


Proor. As in the proof of Lemma 3.2, we have 
(3.30) Gv nos — 6") + Gln" - o)) <9 
G(—~+/n 7/2) 
Therefore 
«(n) 
a= — fa” 7/2) 


By applying (3.9) to (3. we obtain 


~ ) —ily 
(3.32) lim Sup exp{—4n G - gy — x] : nay < 2. 
) Pi — CE 


For any 7 satisfying s < i < s + t, from Lemma 3.4 we have that é§” — g; > 
sy. Therefore 


(3.33) lim sup exp {4ny(y; — é{” + 4y)} S 2 
and finally 


(3.34) lim sup exp {dy (-a" + tito) s2 


Again from (3.9) the left hand member of (3.28) can be written as 
a(n) 

el 
ef" 


(n) 


(3.35) lim sup exp {—4n[(y, —é§”)? — (48 — oi41)"]} - 


and by Lemma 3.4, (3.35) becomes 
(3.36) — {ny (- 4m 4 ot ex) 


n~o 2 


Then from (3.34) we get 


~ : G[V ny: ee é”)] 
(3.37) lim sup ==, 3 4. 
noe Gln” — ead] ~ 
From (3.30) we also have 
7 Gl ne” — vis1)] 
(3.38) lim sup ————>—_—— S 2, 
n~wo P G( —_ a/n 7/2) 


and in exactly the same way we obtain 


(3.39) lim sup nn = =H] < 4, 


from which (3.29) follows. 
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Lemma 3.6. If ¢; belongs to a set of means satisfying Condition A, ands <i < 
s + t, then 
(n) 
(3.40) lim sup dle 2) a") < 
neo G[/n(éi2i — ¢,)] 


and 


(3.41) lim inf © Gales on st > }, 


Proor. Since for any given T < 0, the function G(y) + G(T — y) takes! on| its 
minimum value when y = 7/2, we have 


(3.42) 2G(—Vn 7/2) = GV ne — &”)] + GV n(&” — gear) 


for s S i < s + t. Now as in the proof of Lemma 3.2, we have that r(é,) < 


2G(—+/n 7/2). Hence 


_ . « Gln; — é”)] . . . Gln; — é”)] 
7: “se Ol a or Me” — endl 


Then by Lemma 3.5, the last expression is equal to or greater than }. Similarly 
we: obtain 


GV ne =a «— &”)] < G[Vnlei-n ae é$"))] < 4. 


(3.44) lim sup lim sup 


n+>2 Gl/n(s. — ¢,) sl ) eee Glvali =e] = 


Lemma 3.7. If {a,} and {b,} are two sequences of real numbers such that litin.. 
a, = —v7/2, and 


c G(V/nan) _ 
(3.45) aide G(V nb) 


then 


Cfo, = 
: Noite Sete ieepteneeeee 
_ =k. a 


Proor. By applying (3.9) to the left member of (3.45), we get 
lim exp {—43n(a, — bi)} - 2 =1. 
This implies that 


lim n(a, — b,) = 0, 


n-~o 


lim a, = lim b, = —}y, 


n~e n-o2 
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from which we get 
(3.48) lim n(a, — b,) = 0. 


n~o 


By applying (3.9) to (3.46) we must show that 


(3.49) lim exp {—43n[(an + y)”? — (bn + ¥)"I} - 


n~o 


which can be written as 


(3.50) lim exp {—4n(a, — bn) (an + bn + 2y)} - . 
But this last line follows from (3.47) and (3.48). 

TuHeroreM 3.1. If all the means are equally spaced, that is, if gis: — oi = ¥ 
fori = 1,2, ---,k — 1, then lim inf, gi” > 0 for all i, and the minimax pro- 
cedure ts asymptotically admissible. 

Proor. By Lemmas 3.3 and 3.5 we get that 


a(n) 
(3.51) } < lim inf SY < 4, i re yk— 1. 
noo jj 
Since }0i-1 gi” = 1, it follows that lim inf, ... 9{” > 0 for all values of 7. Then 
by Theorem 2.2 we have that the minimax procedure is asymptotically admis- 
sible. 

THEOREM 3.2. If 2 has a mean, gq , such that min {(¢2 — ¢a-1), (Gaa1 — Ga)} > 
y, then lim inf, .« gS"? = 0, and the minimaz procedure is asymptotically inadmis- 
sible. (For a = 1 take gy = — @ and fora = k take q&4,; = +@),. 

Proor. By Lemma 3.2, we have that 


(3.52) lim inf @. — ¢%") = 


nw~n 


or 


(3.53) lim inf (¢&” — g.) = 4. 


/ 
n~o 


If (3.52) holds, then since (¢2 — ¢a-1) > y, by Lemmas 3.1 and 3.3, we have 
lim inf, Gx /Gx. = 0. Hence lim inf, .. 9%". = 0. Similarly if (3.53) holds, 
we get that lim sup, .. 9°9/g"" = ©, and therefore lim inf,.. 9%” = 0. 
To show that the minimax procedure is asymptotically inadmissible, consider 
the decision procedure, 5, , having decision interval end points 2", where 
ae =e" fori =1,2,---,a-—2,a+1,---,k-1 
(3.54) 


a” = 3; + gia) fori = a— lia. 


Clearly r,(5,) = r(5n) fori < a— 1and fori > a+ 1.Sincer(6,) = G(—~/n7/2) 
by Lemma 3.1 we get 


G[Vnte: — 2”)] i 
oe r(n) 


(3.55) lin 


0, t=a-—Il,a 
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lim G[-V nla” = vi41)] 


= 0 
ee r(6,) , 


t=a-—Ilat+l 
and 


(3.57) 


which completes the proof. 

Theorems 3.1 and 3.2 cover all possible positions of the k means except when 
they can be put into two or more sets, such that, each set contains two or more 
consecutive means, the consecutive means in each set differ by y, and all the 
differences between consecutive means not belonging to the same set are greater 
than y. Throughout the remainder of this section, except when otherwise stated, 
only those cases when the means of © fall into such sets will be considered. The 
number of such sets will be denoted by r and the number of means in the ith 
set will be denoted by n;. Thus }-j.1; = k. The jth mean in the ith set will 
be denoted by ¢;,; and the component of the least favorable a priori distribution 
corresponding to this mean for samples of n will be denoted by @§’}. The right 
end point of the minimax decision interval for selecting ¢;,; for samples of n 
will be denoted by é{") . 

THEOREM 3.3. If r = 2 and all the sets do not contain the same number of means, 
say n; < max,(n;), then lim inf, ... gs" = 0 for all t, and the minimax procedure 
1s asymptotically inadmissible. 

Proor. Suppose 


al) _— ° 
(3.58) lim sup GlVin(Grinmj1 — #54] el . 9 
oo (dn) 
then since (¥j1 — ¢j-1.n;.,;) > y, by Lemmas 3.1 and 3.3 we get that 
lim inf, .. i? = = 0, and by Lemmas 3.3 and 3.5 we get that lim inf, ... gs? = 0 


for all ¢. 
If 7 = 1, or if the upper limit in (3.58) is zero, then 


GLY nleos = @44)) 
(3.59) mn op G[V/nleja — é 1— 63, 3) ~ 


for any q such that n, = max;,(n,). ‘Since g,.2 — ¢)1 = ¢j2 — ¢ia = Y, With 
the use of Lemmas 3.4 and 3.7, we get 


(3.60) lim inf GV née — Pq, — ¥a2)) , 


awe Gl nes? pe oy. )) 
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Then since 


a 


(3 61) r(bn) G[Vn(e7 Anil q.2)| + G[-V/nl¢,.2 / ey] 
=o Gln? — ¢52)] + GV nia — 67)], 


it follows that 
— pin) 
(3.62 lim sup GlVnleos — 3 )] * 1. 
— G[Vn¢i.2 — Cj,2 )) 
In the case when j < r, by continuing in this manner we finally reach 
oo a(n) 
(3.63) lim sup GV nan cam] = 1. 
eee G[V nGi.n; — ee 
But by Lemma 3.6 
a(n) 


(3.64) lim inf oN ere, = Bel > 0. 
nwo Tba 


Hence 
2 ltd ehOP 
(3.65) lim sup Gv n(Giny = 6m) > 0. 
_ r(b,) 


Then as in the earlier part of this proof, we get that lim inf,.. 957 = 0 for 
all ¢. 
If 7 = r, then when we reach 


ae G[V/ ner, -1 or n;)] 
(3.60 im int SV eee ok 2, 
" n> G[V/n(es 1 a $j.n;)| 


it follows that 


n+0 G[V/ (eo, -1 = Ya.n;)| 


contrary to Lemma 3.6. Hence when j = r, we must have the previous case 
with (3.58) being satisfied. 

To prove that the minimax procedure is asymptotically inadmissible, first 
consider the problems P; , P:, --- , P, , where Q; for P; consists only of the 
distribution functions corresponding to the means ¢;,; of the ith set. Now de- 
fine the decision procedure 6, for the original problem by taking for 
2")(1 <j < n,), the right end point of the minimax decision interval for select- 
ing ¢;,; for P; and taking a". = (Gin, + Gi414) fori = 1,2, --- ,r — 1. When 
some of the means of 2 are deleted, the minimax risk for such a sub-problem is 
clearly smaller than the minimax risk for the original problem. Therefore 


r; (Sn) < r(8,) for all i and 1 <j < n,. But for all i, 
—_— 
lim GL nln. a] = 0 
_ r(x) 


,’ 
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tim GL mlee, = gonad] _ ¢ 
ne r(,) 


r; rein) 
ae? i) 


for all 2, j. 


Now since n; < n,, the minimax risk for P; is less than the minimax risk 
for P, , and we have 


a a(n) 
(3.68) lim sup Gl nla = &4)] ea) < 
nae G[Wn(var — &1)] 
If this upper limit is 1, then there exists an infinite subsequence of these ratios 


which converges to this limit. Let the sequence of values of n corresponding to 
this subsequence be denoted by {t}. Then by Lemmas 3.4 and 3.7, we have 


(3.69) = G[Vt (Ca? — ¥.2)] 


Then since 


(3.70) tim ELV LG sa = GAN jig LWEG? — 952] + ClVt@s2 — 22)] 
ene VE @ea = ED) GLEE? — Gea] + OVE ea — ED) 


by use of Lemma 3.6, we have 
tee 
(3.71) lim Glvt ia — 22)] - 
—_ GIVt &e.2 — Gs | 


Continuing in this manner we finally reach 


 GIvt jn; — Eins] _ 

aie lim GEV EWP j.nj — Cin j)) 

we tm GiVilees, =e] 
~(t) 


But since (%¢,n; — .n;) — —%y and for all ¢, 
-(t) 
Ce oe Cin 5) *% 3(¢5.n; — $541.1) < — 47, 


(3.72) contradicts Lemma 3.1. In the case when j = r, the contradiction is ob- 
tained by replacing 2‘, by «. Thus we have shown that 


; GV nia — 2)] 
. li Gi+/nlees — 2) 
(3.73) ‘2 GLVnleea — &>)] 


 % 


But since the minimax risk for P, is the maximum of the minimax risks for the 
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sub-problems, we have 


’ py ae e") 
(3.74) tim AW aleea — Fei) 
— r(6n) 
Now, in view of 


(3.75) tim GY nGirinj1 — esa) _ 


r(,) 


we get 


(3.76) lim sup - rs. rya(bn de 
oe 6G.) 


The existence of this a-vmptotically subminimax decision procedure proves 
that the minimax procedure is asymptotically inadmissible. 

It will be seen in the proof of Theorem 3.7 that {6,} is also asymptotically 
admissible. 

THEOREM 3.4. If there are only two sets of means with the same number of means 
in each set, then all the components of the least favorable a priori distribution have 
positive lower limits, and the minimax procedure is asymptotically admissible. 

Proor. Let n; = ne = 7. Since the minimax risks are equal, we have for all n, 


(3.77) ai - ay = — $2,7-i+1 » +3 4-1, 
and finally 

(3.78) gia — OR = OP — gaa. 

Hence 


(3.79) Ce = Hera t+ ¢2,), 


and from (3.7) we get that gi? = gS‘? . It then follows from Lemmas 3.3 and 


3.5 that all the components of g‘” have lower limits greater than zero. Then 
by Theorem 2.2 the minimax procedure is asymptotically admissible. 

TuroreM 3.5. If all the sets have the same number of means but the means are 
not symmetrical with respect to the point, }(¢1.1 + ¢,n,), then all the 95") of at least 
one set have zero as their lower limits. 

Proor. Let the number of means in each set be 7 and let y; = ¢i411 — Gin, - 
Let j be the smallest integer for which y; # y,; . The hypothesis assures the 
existence of such a 7. Since all the minimax risks are equal, we have 


a(n) a(n) 


res Cit = Ga ai—1 — Orin 


a(n) a(n) 
¢1,2 — ¢i, - 6,8 ai—2 — $r,ni-1 


~(n) a(n) 
epa' t34.™ Gnd a — Or—j4+i.1 - 





570 JACK LADERMAN 
Now in order for g§¥ , #Sti.1, @3.4, and gS")... to have lower limits greater 
than zero, Lemmas 3.1 and 3.3 require that 


(3.81) lim (93,4 — ¢§3) = —41; 


n~o 


and 


(3.82) lim (7). = Gr—j41,1) = —dyr-3 . 

In view of the last line of (3.80), and since y; * y,—; , (3.81) and (3.82) can not 
both hold. Finally, since at least one component of §‘”, say gS has zero for its 
lower limit, all the components of g‘”” corresponding to means of the ath set 
also have zero for their lower limits. 

Although under the conditions of Theorem 3.5 the lower limits of some of 
the components of §‘” are zero, nevertheless the following theorem shows that 
the minimax procedure is asymptotically admissible. 

THEOREM 3.6. Jf all the sets contain the same number of means, fi, then the min- 
imax procedure is asymptotically admissible. 

Proor. Consider the sequence of decision procedures, {5,}, defined as in the 
proof of Theorem 3.3. Since all the sets contain the same number of means, the 
rii(5n), (1 S i S r,1 <j < A) are equal for all n. Denote its value by r(é,). 
We have 


(3.83) lim CLV nein — 2%) 


~ = 0, 
——— r(dn) 


and 


(3.84) tim Vn ie = eivs)} _ 9, 
noe r(d,) 
Hence 
° 7:,j(5n) aie 
(3.85) ere es 


for all 7, 7, and consequently 


. r(bn) = 
(3.86) neat i =1 


Now suppose {6,} is an asymptotically subminimax procedure with 
(3.87) lim sup "=2(=) — 1, 
noe r(dn) 


where 7r.,8(6,) is the risk associated with the Sth mean of the ath set. Consider 
the sub-problem, P, , with Q. = {N(¢a,;, 1)}, GG = 1, 2, --- , 7). The min- 
imax risk for P, is r(5,). In view of (3.86), the procedure derived from 6, by 
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using only the end points c{"}(j = 1, 2, --- , 7 — 1) would bean asymptotically 
subminimax procedure for P, . This is in contradiction to Theorem 3.1 which 
completes the proof. 

By Theorems 3.2 and 3.3 we see that the minimax procedure is undesirable 
for large values of n when @ consists of k normal distributions with a known 
common variance and different means, except for the special cases covered by 
Theorems 3.1 and 3.6. 

In order to compare the minimax procedure with an asymptotically submin- 
imax procedure for some specific values of n, consider the decision problem where 
2 = {N(O, 1), N(.2, 1), N(1, 1)}. It is easily seen that for n = 100, the min- 
imax risk satisfies the inequality 


(3.88) G(—1) < r(8:0) < G(—1) + G(-7). 


Now consider the asymptotically subminimax procedure, {5,}, given by ci" = 
é”, cs” = 3(¢2 + os) = .6. For n = 100, we get 


r1(5100) = r(d100) 
(3.89) r2(8100) < r(8:0) + G(—4) 
13(d10) = G(—A4). 
Hence 
r1(6100)/r(610) = 1 
(3.90) r'2(8100)/1(b100) < 1 + G(—4)/G(—1) < 1.0002 
r3(8:00)/1(8100) < G(—4)/G(—1) < .0002. 


Clearly 6:90 is a more desirable decision procedure than the minimax procedure. 
Even for n as small as 25 it is similarly found that 


r1(525)/r (25) = 1 
(3.91) r'2(525)/1(805) < 1.08 
'3(525)/r(b05) < .08, 


which shows that the minimax procedure might be undesirable even for mod- 
erately small values of n. 

Now that we have determined the asymptotic behavior of the minimax pro- 
cedure for the class of problems for which 2 = {N(¢; , 1)}, and since we have 
seen that the minimax procedure is most frequently asymptotically inadmis- 
sible, the question naturally arises whether asymptotically admissible asymp- 
totically subminimax procedures exist in those cases. It will be seen that such 
procedures do exist for some problems but not for others. For example, con- 
sider the problem where 2 = {N(¢i , 1), Ni + y, 1), N@i +7 + 311, 1)} with 
v1 > y > 0. For this problem we know from Theorem 3.2 that the minimax 
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procedure is asymptotically inadmissible, and if {5,} given by {ci”, c3”} is an 
asymptotically subminimax procedure, then 


(5n bn 
(3.92) im a) ti a) 


and 
(3.93) lim sup os <i. 


It will now be shown that it is always possible to find another asymptotically 
subminimax procedure, {5,}, given by {@”, @”} with 


: Ta(bn 
94 a 
(8.94) lim 3(5n) . 


Hence no asymptotically admissible asymptotically subminimax procedure 
exists for this problem. 
For the procedure, 6, , we have 


(3.95) lin Ge) tim Gv nl — 41 i) - 


a — 
noe (3) 2-0 Ol alon — é")| 


Consequently, by Lemma 3.7, we have 


, Glvinlei” - - v2) _ 
. I r 
(3.96) ae G[Vn(a” — >) 


Also 
(3.97) lim 2") — jim Gla” — | + Glv'nlee = 2" )] _ 


nae 7(b,) = G[r/n(e\” — o)] + Gln — &)) 


and since ¢{” — g. > —4y and g — ¢$” — —y + }y < —}y, we have 


— stm) 
(3.98) lim GV lon = 63 *.. 
n> G[-V/n( ee — ol 
By use of (3.96), (3.97), and (3.98), we get 


ng. — es” )| _ 
3.99 tim GV 
( ) oa Gl-V/ nek” 7 | 


Denoting (ek” — go) — (¢. — ec") by d, , (3.99) becomes 


(3.100) lim ee n(er” _ ¢: — d,)| = 0. 
nee G[v/n(é” — o)] 


Now by use of (3.9) we have 


(3.101) lim exp {— 4n[(é{” — ¢ — d,)* — (41" — @)° *\}- 


n~s2 
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Since 


a(n 


) 
(3.102) lim inf sig enn > 0, 
O20. A. TP. G2 "Gn 


we get finally 


(3.103) lim exp {— 3n(—d,)[2(e” — ¢.) — d,J]} = 0. 


nww 


But 


(3.104) lim (é{” — g) = —}y 


nwo 


and 


(3.105) lim inf d, 2 


n-o 


From (3.103) we then get 
(3.106) lim nd, = ©: 


nwo 
Now consider the sequence of procedures, {5,}, defined by 
(3.107) gr =c”, 4” =e” — hd,. 
Clearly r:(5,) = 11(6,). Also 


Vn” = @)] + Avner — &”)] 


Gl nler — 04” + 4de)) 1 4 jim nei” — ¢2 — 44.) 
GlVn(er” — ¢2)] ae G[-/nle — os] 


a(n) 
ee 
a(n) 
Ci — ¢ — 3d, 
) a g 
: 9/26 ee 
1+ limexp {— }nl—34,JI2({” — 2) — 4d,)} « =o = 1, 
nooo ci —¢— 3d, 


lim ra(5n) = 


1 + limexp { — 4n[(é{” — ¢. — 3d,)° — (4 — ¢o) |} > 
n~@® 


and 


tim 72%») — tim GlYnte” — ed) _ 4, Clv¥nle” — o — 44,)] 


= lim — 
now rg(5,) 20 GlV/n(cr” — gs)] > G[Vnler” — ¢s)| 
cf" — va 


lim exp {— 4n[(cf” — gs — 3d,)° — (3 — ¢s)']} - 
no ~— 3d, 


(n) 


C2 


(n) 
j n Ce _— 
lim exp {— 4n[—4d,][2(c!” — 93) — 4dn]} - = oe Q, 
n+>o Co a ¢3 — $d, 


The existence of such a {6,} shows that every asymptotically subminimax pro- 
cedure for this problem is asymptotically inadmissible. 
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However, asymptotically admissible asymptotically subminimax procedures 
do exist for some problems. This will be brought out by the following theorem: 

THEOREM 3.7. If the means can be separated into two or more sets, each con- 
taining two or more consecutive means, with a common difference, y; , between the 
consecutive means of the ith set with e411 — Gin, > 4(vi + vias) for all i, and 

(a) the y; are not all equal; 
or 

(b) all the y; are equal but all the sets do not contain the same number of means; 
then there exists an asymptotically subminimax procedure which is asymptotically 
admissible. 

Proor. Consider the problems, P; , P:, --- , P, defined as in the proof of 
Theorem 3.3, and the decision procedure 6, defined there, except we shall now 
take 


(3.108) a, ae $(Gi.n, + $i41,1) + (yi ~ i+), 
fort = 1,2,---,r— 1. 
Suppose (a) is satisfied, then since 


Viag a". oP bse 47; 


=(n) 


Cin, ~ Gia < — Fin» 


(3.109) 


we have for all 7, 


(3.110) lim G[Vnlein, — 2] 


Nn = 
n+>2 G[-V/ nla a1 Gin) 


and 


7 OV alee, — vera] 
(3.111) Bia epee : = 0. 
n»0 Gl n(oiss, 1 — ati)] 


Hence if we denote the maximum of the minimax risks for the r sub-problems 
by r(5,), we have 


(3.112) lim sup “2 rin) = i. 


oom r 5.) 


But since r(5,) < r(é,), we have 


(3.113) — ridlbn) — 
: —— r(,.) = 
for all 2, 7. 


Now denote min,(y;) by y» . In case (a) there must exist ay. > ye, and by 
(3.111) we have 


eg _ GV nlyoa — 25)] 
= lim 
(3.114) ee ri ) on aati: G[V/ nl. — a4 Fm 29° 
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But by Lemmas 3.1 and 3.4 it follows that the right member of (3.114) is zero 
which proves that {4,} is asymptotically subminimax. 

In the case when (b) is satisfied, {5,} was shown to be asymptotically sub- 
minimax in the proof of Theorem 3.3. 

In either case, suppose there exists another sequence of decision procedures, 
{5,}, given by {2{7?} such that 


. 7 (bn) 
(3.115) lim sup Ts fOnl <1 
noe rs,j(3n) 


for all 7, 7, and the strict inequality holds for at least one set of values of 7, 7, 
say a, 8. Then the procedure derived from 5, by using only the end points 2") , 


(j = 1, 2,--+, me — 1), would be an asymptotically subminimax procedure 
for P. , contrary to Theorem 3.1. 
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ESTIMATION OF THE MEAN AND STANDARD DEVIATION 
BY ORDER STATISTICS. PART III 


A. E. SARHAN 
University of North Carolina 


1. Summary and Introduction. In a previous work [7], the mean and standard 
deviation were estimated by arranging all the sample elements in ascending 
order and taking the best linear combination of them. We will use here the same 
principle to estimate the mean and standard deviation of certain populations 
from singly and doubly censored samples. 

Censored samples may be considered as truncated samples having a known 
number of unmeasured (missing) observations, i.e., those in which the total 
number of sample elements is known, but measurements on some of which are 
lacking. 

In life testing, fatigue testing, and in other tests of a destructive nature, we 
have n items drawn at random from some population which when subjected to a 
test, fail in order of time. To save time and/or items, it is often required to 
stop the experiment (to censor the sample) after recording the first r (<n) 
observations. This is a censored sample from the right. 

Again, censored samples are found frequently in biological data where some 
of the observations in a sample are either below or above a limit in the measure 
used. The values beyond this limit are believed to form a continuation of the 
scale of measurement but are unmeasurable in the experiment [6]. For example, 
in experimental biology, n samples from each animal are tested for antibodies 
after a certain period of time. Only r of these samples contain measurable 
amounts while (n — r) of the animals develop the antigen at a level too low for 
measurement by the prevailing technique. This is a censored sample from the 
left. 

In fact, the estimation of the mean and standard deviation based on the linear 
combination of all sample elements is a special case, and the general one is con- 
sidered here. 

Censored samples were considered recently in the work of Ipsen [6], Walsh 
\8], Hald [4], Gupta [3], Cohen [1], Halperin [5], and Epstein et al [2]. 


2. Rectangular population. The frequency distribution of the rectangular 
population is 


(2.1) fly) = a.’ 0, — &b/2 Sy SA + 62/2. 


Consider the case where the observations on the smallest r; and largest re 
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sample elements are missing and those only on the middle n — r; — rm. sample 
elements are known, then we will have 

Ty > 2 

Ti +. 1 


(22) V* = (n+ 1)(n + 2) 


n+1 n—f—1 
(n — 12)(r2 + 1) n— Te 


eer 1 
'V 1 is ba eigen 
(A A) (n + 2)(n — 11 — fre — 1) 


(r; + 1)(n — 2r. — 1) + (re + 1l)(n — 2rn, -— 1) 


re OF 


2 


(A' v4) A’ = et 


(n — 1 — f— 1) 
(n — 2re — 1) n—2r,—-—1 
2(n + 1) 2(n + 1) 


on} as 1 
Hence, 


‘ 1 ‘ 
(2.5) 6f = ————__—__ [(n — 2a — LI) yrjstiny + (mn — 21 — Lyen—re 
) 6 a gr eas IK 2 y 1 )Y(n—ro/ny) 


and 


(2.6) 6s = 


( Se. eran Yory+t/n)]- 
From (2.3) the variances of the estimates are 


i alli (r, + 1)(m — 2rg — 1) + (re + I(r — 2r, — 1) 2 
27) Pe ee ee . iecinasinaiecentampanessahie 05 
( é) V(6;) = 4(in + lin+2 2)(n —mN—-hMm- 1) 2 


9 
9 r . a ~ 2 
(2.8) V@:) (n+2)(n—n—n— =p *° 


The relative efficiencies are the following: 
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Relative efficiency of 6T 


(2.9) i. 2(n ~~  — re —_ 1) 


nt L)(n — 2r, — 1) + (re + Ll)(n — 2r, — 1)’ 


. ss 2(n — m1 — re — 1) 
2.10) relative efficiency of 63 = - - ——__- ‘ 
; (n — 1)(1 + P2 + 2) 
The efficiency is calculated relative to the best linear estimate using all the 
sample elements. 
As a special case, if r: = 0, i.e., the observations on the smallest r; sample ele- 


ments are missing and we know only the largest (n — r,) observations, we will 
get 


(2.11) ; ae ox it {(n = 1) Y¢r,41/n) + (n — 2n — 1)Yn/ny] 


and | 
(n + 1) 


(2.12) = nn — i [Ycn/n) a Ycr+1/n)] 


with variances 


(2.13) V(6*) nr; + 2n — 3ri — 2 


~ ain + 1)(n + 2)(n — 7; — 1) . 





and 


(2.14) V(r) = (r + 2) . 


a 2 


(n + 2)(n — m1 — 1) 


Similarly, for the other special case, i.e., where the observations on the largest 
r; sample elements are missing and we have only those on the smallest (n — r2), 
we will have 


(2.15) OF L 


; = 2(n — 2 i) {(n — 2r: — 1) Yan) + (n = 1) Ycn—re/n)] 


and 


(2.16) = “te ) [Yin- rein) — Ya ny) 


with variances 


(2.17) (nts + 2n — 3r2 — 2) 


4(n + 1)(n + 2)(n — rz — 1) 
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and 


7(a*\ — (re + 2) 2 
(2.18) V(r) = cra coh 62. 

The results in this case and in the previous one are related. If the smallest r; 
observations are missing and (n — r,) largest are known, then we can use the 
same estimates (2.15 and 2.16) for estimating @f and 62 by letting yam) > 
Yon) > *** > Yo—rin)- Then the coefficients for constructing the best linear 
estimate of 6; will be identical with those given in (2.15). For the case of 63 , the 
coefficients will be numerically the same but with opposite sign. The variances 


will be the same if rz is replaced by r: . This, of course, applies for all symmetric 
distributions. 


In particular, if we have 


(2.19) fly) = x OSysh 
2 


and the smallest r; and largest r. observations are missing, we will get 


n + 1 
(2.20) a = = 
2 n Tr Yn Tq/n) 


with variance 

q ac 1 2 
2.21 TORY tienen tt te 
( ) (62) G-onare 


The relative efficiency is 


(n — 7) 
n(re _ 1) : 

If the largest r. observations only are missing, we will get results exactly as 
above. 


Furthermore, if the smallest 7; observations are missing, we will get results 
exactly as those obtained by using all the sample elements, i.e., 


relative efficiency of 67 = 


n 1 
(2.23) 63 _ = Ycnin) 5 


1 2 


o< 7 (p* = 
(2.24) V (62) nn +9) 63. 


The efficiency of the estimate in this case is 1 and there will be no effect of the 
missing values on the estimate. 


3. Other symmetric distributions. The estimates of the mean and standard 
deviation of some other symmetric distributions, their variances and their 
relative efficiencies from singly and doubly censored samples are also worked 
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out. These are only worked for samples up to size 5 and tabulated in tables (I, 
II, III, IV). These distributions are, 


(3.1) u-shaped ti = —, ib —SySA+h 
2 


(3.2) rectangular fly) = - :$— 32 fy 56+ 36 
2 


parabolic fy) = 6(y —-A+ $62) (01 + 302 — y) 


(3.3) 63 
6, — 3 
(3.4) triangular fly) = a (36. — ly — | ), 


(3.5) double exponential fy) = 5 ge aerie “esy 


The values for the estimates of the normal distribution are also given in order 
to be used in the discussion. 


4. Discussion. Table I is constructed to give the coefficients of the best 
linear estimate of the mean and standard deviation in singly censored samples 
(from the right) in different symmetric populations up to samples of size 5. If 
the sample is censored from the left, i.e., the smallest r; observations are gee 


and the (n — r;) largest are known, we can use the same table for estimating the 
mean and standard deviation of the given populations by letting yajn) > Yen) > 
-++ > Y(n—r;/n) - Then the coefficients for constructing the best linear estimate 
of the mean will be identical with these given in table I. For the case of standard 
deviation the coefficients will be numerically the same, but with opposite signs. 
The table shows that: 

(1) The coefficients of the largest known observation (i.e., ¥/(n—r2/n)) in the esti- 
mate of the mean are greater than the corresponding coefficients if the sample was 
not censored. This is true for all values of n and in different populations. In 
fact, the coefficient of yin—,./n) is greater than that of y;,) for the same popula- 
tion and sample size except in the case of the u-shaped distribution for n = 3, 
me = a. 

(2) For a fixed n, as the number of the missing observations (r.) increases, the 
coefficients of (¥/(n—r2/n)) increase while those of the smallest observation decrease 
towards zero and then take gradually decreasing negative values. An interesting 
case is that of the coefficients in the estimate of the mean of the rectangular 
population. For fixed n, as rz increases, the coefficient of the largest known ob- 
servation increases and that of the smallest one decreases. When 27. = n — 1, 
the smallest element will have zero weight. Hence in this case, the mean of the 
rectangular population is estimated by the largest known observation. 

(3) The coefficients of (y(n-+./n)) in the estimate of the mean for samples of 
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TABLE II 


Variances and efficiencies of the best linear estimates of the mean (67 ) 
and standard deviation (o*) from singly censored samples 
in different populations (¢ = 1) 


Variance of OM Efficiency | Variance of «* | Efficiency 


5 . 1674434 
Rectangular y ; . 1428571 
Parabolic : ; . 1572840 
Triangular ‘ 3. | .1719055 
Normal a ’ ‘ . 19476 
D. Expon. : 8 99. .3097230 


U-shaped an : .5574709 
Rectangular 4 3. . 2857142 
Parabolic ‘ j : . 2778808 
Triangular .3184536 60. . 2838755 
Normal . 28393 52.80 .31809 

D. Expon. 1724911 91. .4634527 





U-shaped 2.2113788 3.3 .7935410 
Rectangular 1 .0000000 . - 7142855 
Parabolic . 7920637 22. .6414834 
Triangular . 7002408 6: .6330318 17.05 
Normal .61123 2.73 .69571 

D. Expon. 2743320 3: .8481830 8.03 


U-shaped .4119197 : . 3482922 
Rectangular .3500000 : . 2500000 44. 
Parabolic .3261310 ; 2594385 51. 
Triangular . 3090288 9. .2737111 55. 
Normal . 28701 ; . 30208 59.6 
D. Expon. . 1860330 98. .3339501 89 . 4: 


U-shaped .4622735 8. .3411707 7.2 
Rectangular 8000000 | 25. .6666667 16. 
Parabolic .6573040 35. .6225662 21. 
Triangular . 5832768 ‘ .6114363 24. 
Normal .51299 ; .67303 26. 
D. Expon. .3335692 2. 2 .9457112 31. 


U-shaped .5340045 
Rectangular .6000000 
Parabolic .5324677 
Triangular .4928568 
Normal 44867 

D. Expon. |  .3194197 


.3204021 14.68 
-6000000 33. 

.5887268 37.73 
.5986393 40 .34 
.63783 43.19 
.8760051 49 .33 


BRERSE 


Efficiency is calculated relative to the best linear estimate using all sample elements. 
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TABLE III 
Coefficients in the best linear estimate of the mean and standard deviation based 
on the order statistic y; from doubly censored samples in different 
populations of size n, for the mean 0 = 0773.1 Buy, 
for the standard deviation o* = >- 777.1 Bos Yoo 





” rn ra | Population Biz Bis Bus Boa Bra Bu 


5 |1]1] U-shaped | .5595857) —.1191714) .5595857| — .7832027 0 | .7832027 
| Rectangular |.5000000 0 | .5000000) — .8660253) . 8660253 

| | Parabolic | .4509135| .0981730| .4509135| —.9211128, 0 9211128 

| Triangular | .4051809) .1896383| .4051809| — .9614817| .9614817 
D. Expon. | .2377907| .5244186| .2377907|—1.234223 | 0 = |1.234223 


| U-shaped | .000000 | —1.5664142|1 5664142 
| Rectangular | 1.000000 | —1.7320506|1 .7320506 
Parabolic | .000000 | |—1.8422256 1 8422256 
Triangular 1.000000 | pea? |—1.9229645|1.9229645 


| D. Expon. | | .000000 | |\—2.4684456'2 4684456 


| 
| U-shaped |—1.3053363/1.3053363 5000000} .5000000 
| Rectangular |— 1 .4233755|1 .4233755| —_ .5000000 5000000! 
Parabolic | |—1.5351864|1.5351864, 5000000] 5000000) 


| | Triangular 


| —1.6024692|1.6024692|  .5000000!  .5000000) 


| | 
|—2 0571088'2.0571088 -5000000) - 5000000) 





size 4 and 5 are the largest in the case of the u-shaped and decrease gradually for 
the other distributions in an order indicated by their arrangement in the table. 
The coefficients of the smallest observation for the different populations change 
in the same order. 

(4) The coefficients of the largest known element (y,_,,/n) and of the smallest 
observation in the best linear estimate of the standard deviation for all popula- 
tions are greater and smaller respectively than the corresponding sample ele- 
ments if the samples were not censored. Also, for a fixed sample size and for the 
same distribution as rz increases, the coefficient of the largest known element be- 
comes larger and the coefficient of the smallest observation becomes smaller. 

Table II is constructed to give the variances and the efficiencies of the estimates 
of the mean and standard deviation of the different symmetric populations from 
singly censored samples up to n = 5. The efficiencies are calculated relative to 
the best linear estimate based on all the sample elements. 

This table shows that: 

(5) For a fixed n, as re increases, the efficiency of both the estimate of the mean 
and standard deviation decreases. 

(6) For every fixed value of n and r- , the efficiency of both the estimates are 
low for the u-shaped distribution, increases in the rectangular, then the para- 
bolic, the triangular, the normal and greatest in the case of double exponential. 
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TABLE IV 
Variances and efficiencies of the best linear estimate of the mean (67 ) 
and standard deviation (o*) from doubly censored samples 
in different populations (¢ = 1) 


| 














” rn re | Population Variance of e: | Efficiency | Variance of o* Efficiency 
5 1 1 | U-shaped | .3261912 | 20.71 3536064 | 13.30 
Rectangular |  .2857140 49.99 . 2857140 25.00 

| Parabolic 2620758 68.30 .3013740 | 30.71 

| Triangular . 2471006 78.27 . 3206393 33 .67 

| | D. Expon. .1587055 99.82 .4386776 | 52.16 

| U-shaped | 7217100 9.36 | 1.1284812 4.17 
Rectangular -4285714 33.33 7142857 10.00 

Parabolic | .3611096 49.57 .7146330 12.96 

Triangular | .3198053 60.48 | .7300379 | 14.79 

D. Expon. | 1755004 | 90.23 | .8935550 | 25.61 

4 1 1 U-shaped .5343228 | 23.95 .9346768 | 10.22 
Rectangular | .4000000 | 50.00 .6666667 16.67 

Parabolic | 354731 | 65.27 .6755588 19.78 

| Triangular 3278568 | 74.53 .6948200 | 21.79 

D. Expon. .2100694 | 98.91 .4256344 | 70.16 


Efficiency is calculated relative to the best linear estimate using all sample elements. 


In the latter distribution, if the middle sample element is included among the 
missing observations, the efficiency becomes very low and loses its relatively 
high standing among the symmetric distributions. 

(7) The relative efficiency of the estimate of standard deviation of a given 
population is less than the corresponding efficiency of the estimate of the mean 
of the same population. This indicates that the estimate of standard deviation is 
affected more than the estimate of the mean by missing observations. 

Table III is constructed to give the coefficients in the best linear estimate of 
the mean and standard deviation from doubly censored samples while table IV 
gives the variances of these estimates and their relative efficiencies. Table III 
shows that: 

(8) The coefficients of the largest known observation vary in the different 
populations in the order indicated previously. The case of the normal distribu- 
tion is not indicated, but one would expect that for different values of n, ri, re 
the coefficient of the estimates will take values between those of the triangular 
and those of the double exponential. 

(9) When the middle observation and the next to it in order are the only ob- 
servations known, then only the middle one is used to estimate the mean. Table 
IV gives the same picture as table II and one would expect that the efficiencies 
of the estimates of the normal population will lie between those of the triangular 
and those of the double exponential. 
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5. Exponential population. The frequency distribution of the exponential 
population is 





lA 
lA 
Pal 


(6.1) fly) = =o", wsy 





Consider the case where the observations on the smallest r; and largest rs 
elements are missing, then we will have, 


(52) V" = 


1 
Ss TS oa —(n — 1, — 1)? 0 tee 0 





















+ (n—i +1)? 


(n—1r, —1)?+ (n—17r, — 2)? —(n — vr, — 2)? 





(n — m1 — 2)? +’ (n — 7, — 3)? --- 


and 


sheila aati 
(n — 11 — fT2 — 1) 


ritl 1 2 ry+1 1 r+ 1 
Le (n—it J $e-n—9-) eis eis D 
oe ae 1 iad , 
inl (n — 2 + 1) 
Hence, the estimates and their variances are given by 


1 ryt+l 
p* =C {2 + (n — 1) Zz Latte jn) 


i=l (n —1 —71 <4 1)) 


(5.3) (A’V" A)" = 


(5.4) ri+l ri+l n—re 
—n 2 @-ie)pe™ Aoi 7 D ae, vi | 

(5.5) o* =C | >, Yijny — (n — T1)Yory+1/n) + nve-nse |, 

, on 1 2 ritl \ ‘ 
(5.6) vor fe] oats | +E ace 
(5.7) V(o*) = Co’. 

The estimate of the mean is 
|} Yori41/n) + {e sil ae = i} {in — T1)Ycry+1/n) 

C fmt (n — 2 + 1) sy 

(5.8) 


n—re \ 
aac Zz i rater} | 


t=—ri+1 
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with variance 


rit+l 2 rit+l 
et lets 


¢ n—1 ] CH (n-1 + 1) 
(5.9) . ri+l 
—2 1 
2. (n - —i4 me 1) x f 
where 
(5.10) C= A —— 





(na — T — fe — a 
The relative efficiencies of the estimates are 


relative efficiency of »* 


3 ee es . ae >| 
(5.11) = an /|4S wos + Bea eit " 


: 1 
- € ff * i a es Y 
(5.12) rel. eff. of o s 5/ ¢ 
Fi (Crit 1 2 
* — y ee — 
rel. eff. of (u* + 0%) = > o{| ¥ aor | 
(5.13) 1 ritl 1 ritl 1 \ 
Lo eet tii ‘ig ae se 
+o 2 @roipi mais TY: 


If we have the smallest r; observations only missing, we will get 


1 
u* ea | (n —_ 11) Ycryri/n) Se 2 _ Maal ”f 





(n “a> 1) t=ry+ 
(5.14) ' 
rit+l 1 
aEziTy te-4- Des 
and 
= l ~ 
(5.15) o* = Tn mer = 1D ae Yuin) — (n _ nue | 
with variances 
* o 
VG") = (n — 1 — 1) 
(5.16) : | 
| ri+l 1 ) 1) ¥ 1 | 
(2 S-at ps @- 8 Mawr 
and 
(5.17) Vie*) = ——_~ 


(n— 1% —1)° 
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The estimate of the mean will be 


1 ( ri+l l 2. 
ine 5 Yiiln) / 
eal im (mn —i+ iy «a Tisiny | 


as =ri+l 


rit+l 1 \ 
+4(n — n) ze ( Yors+i/n) 


— (n-it+1) I 


(5.18) 


with variance 
o rit+l 1 ? 
_. mean It anata 
(5.19) 
ry+l 1 rit+l 1 
sll GF) tl susie 
2 @nae De y aca t?] 


For the case where the observations on the largest rz sample elements only are 
missing, we will get 





1 : n—Trea 
(5.20 * = ————______ | n(n — ra)yany — Yesin) — T2Ycn—rein 
) uw n(n — t2 — 1) n 2) Ycasn) dX Yin) 2Y(n—re/n) |> 
(5.21) 4% 2 wien bo Yoijn) — NYasny 1H 12 Ycn—ra/ 
(n—r—1)_& 7 — a 
with variances . 
(5.22) VU ee 
n(n — tT. — 1) 
and 
(5.23) V Odie cctk nies; 
(n — y= 1) 
The estimate of the mean is 
aiid 1 — 
(5.24) cy —~ | (n — 1) z Yin) — NT2Yain + t2e(n — 1)Y(n—re/n) 
n(n — re — 1) 4 
with variance 
n—n—-r?T 
(5.25) aa 2 


n(n —t%—1)°° 


In particular, if the distribution is 


(5.26) fly) = 1 a 0 


o 


IIA 
IIA 


oe 


y 
and the observations in the smallest 7; and the largest r. are missing, we will get 
rit+l r+! 
a rt i+ see ee 1+ : i 
KL m@—-it+1)/ Si (nm—t+ 1) 
(5.27) | as 
Pa (n — Ty) )Yory+t/n) + T2Y n—r/n) +- y 7 Ha | 


t=r;+1 
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with variance 

y o 
(5.28) V(o*) = = 
K 
where 


(5.29) K = {3 . ff : +(n—-n—-nm— | 


lia —it+il ia (n — 2 + 1)? 


and 
(5.30) rel. eff. (o*) = K/n 


where K is given by (5.29). 
If only the smallest r; sample elements are missing, we will get 


, rit+l l ( ryt+l l 2 
= < 
. rz ais hb? a-i+ | 
( rit+l 1 \ pn 1 
ve » |S (n—it+ ote (n —i+1) 
ri+l 7 n 
1 
/ 2 re ip —(n—- rp Yoritijny + \e,, var |, 


r 1 
. 2 << l 
V(o*) =o 
(o*) r iar 
(5.32) 


(fr,+1 l | oral } 
[2 (n — 2+ 1) 9A ae 2, (n—i+t Df 


For the case where the largest r. items only are missing, we will get 


iain 
dX Yoiny + T2 Y(n—re/n) 
(5.33) ek Se teenenn 

nm — fe 
and 


(5.34) Vie") = >——~-. 
(n — f2) 

The results (5.33 and 5.34) are exactly the same as the maximum likelihood 
estimates obtained by Epstein and Sobel [2]. It has also been shown that the 
estimate obtained has the minimum variance. 

Table (VI) is constructed to give the coefficients of the mean and standard 
deviation of the exponential population (5.1) from singly and doubly censored 
samples. In singly censored samples, the coefficients of the largest known ob- 
servations in the estimates of the mean and standard deviation increase as rz 
increases—for fixed n—and that of the smallest observations decrease. 

When the sample is censored from the left, as r; increases, the reverse situation 
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ORDER STATISTICS 


TABLE VII 
Variances of the best linear estimate of the mean and standard deviation 
in singly and doubly censored samples from the skewed distribution 
(4.1) and the exponential distribution (5.1) 


] 
Skewed Exponential 





| Variance of mean* Variance of o* ‘ Wertenes of mean* Variance of o* 


ee 
-0090718 | .1577897 | . 2533000 | 3333333 
.0124467 =| .2631055 .3600000 | 5000000 
.0244913 | .4627518 6800000 .0000000 


0100586 | .1836198 | .2033333 | 3333333 
0144719 | = .2911154 .2370830 |  .5000000 
0306681 |  .6593896 5438889 | 1.0000000 


0104491 3564131 .2537500 | — .5000000 
0131271 | .9637797 | .2605555 | 1.000000 
0146836 | .7282964 4050000 0000000 


.0123354 | .2678224 |  .3438000 .5000000 
0243523 | .6239451 | .6250000 .0000000 





0132724 2739464 | .2604166 5000000 
0260043 6424895 2152777 .0000000 


0117789 6891365 347222 0000000 
| 0199459 .5958348 | .5555555 .0000000 
| 0214751 | .6104800 | 3888888 0000000 


holds true for the coefficients in estimation of the mean. In estimation of the 
standard deviation in this case, however, the coefficients of the largest known 
observations increase whereas the smallest known observation is always —1. 

In all cases, the coefficient of the largest known observation in the estimate of 
the mean in samples censored from the right is greater than the coefficients of 
the smallest known observation in censored samples from the left if the number 
of missing observations are equal in the two cases. 

Table (VII) gives the variances of the estimates of the mean and the standard 
deviation of the same population. This table shows that the variance of the 
estimate of the mean varies according to the number of missing observations and 
to the side from which these are censored, whereas that of the variance of the 
standard deviation depends only on the tota] number of missing observations. 


6. A skewed distribution. The coefficients in the best linear estimate of the 
mean and standard deviation of the skewed distribution 


6a) sw = 2(¥ 


2 
1) (3-45), 0-208 Sy 0+ 20/3 
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from singly and doubly censored samples up to size 5 are given in table (V). 

The same remarks are also to be noted here for the coefficients of the largest 
or smallest known observations. 

Table (VII) gives the variances of the estimates for samples up to size 5 and 
different values of r; and r.. For singly censored samples, the table shows that 
the variances of the estimates vary according to the number of the missing 
observations and to the side from which they are censored. 


7. Summary. The best linear estimates of the mean and standard deviation 
of the rectangular and the exponential populations, their variances and relative 
efficiencies from singly and doubly censored samples, are given to samples of 
size n. 

The same results are given up to samples of size 5 for some other symmetric 
distributions. There is a certain pattern in the behavior of the coefficients of the 
smallest and largest known observations as the number of unknown observations 
increase in the same population. These coefficients also vary in different popula- 
tions in a certain order according to their shapes. The effect of the tails of the dis- 
tribution on the estimates are also considered in the case of the skewed dis- 
tribution. 

It is to be noted that, in some cases where sufficient statistics exist, the best 
linear estimates can be obtained quickly from them. 

Finally, the author wishes to acknowledge the kind help of Dr. B. G. Green- 
berg, under whose direction this work was done. 
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THE MOMENTS OF THE SAMPLE MEDIAN':? 


By Joun T. Cuu anp Harotp Hore.uine 


University of North Carolina 


1. Summary. It is shown that under certain regularity conditions, the central 
moments of the sample median are asymptotically equal to the corresponding 
ones of its asymptotic distribution (which is normal). A method of approxima- 
tion, using the inverse function of the cumulative distribution function, is ob- 
tained for the moments of the sample median of a certain type of parent dis- 
tribution. An advantage of this method is that the error can be made as small as 
is required. Applications to normal, Laplace, and Cauchy distributions are dis- 
cussed. Upper and lower bounds are obtained, by a different method, for the 
variance of the sample median of normal and Laplace parent distributions. 
They are simple in form, and of practical use if the sample size is not too small. 


2. Introduction. Let a population be given with cdf (cumulative distribution 
function) F(x) and pdf (probability density function) f(z), and median £ which 
we assume to exist uniquely. Let denote the sample median of a sample of 


size 2n + 1. Then the pdf g(x) of Z and the pdf h(x) of the asymptotic distribu- 
tion of % are respectively 


(1) g(x) = C,[F(x)|"[1 — F(x)|"f(a), 
where C, = (2n + 1)!/(n!n!), and 
(2) he) = (mp,) ee, 


where @. = {4[f(é)(2n + 1)}™. 

This asymptotic normality of Z, when f(£) is known or replaced by an esti- 
mate, can be utilized to obtain approximate confidence intervals and significance 
tests for §. Whether or not such approximations are acceptable in practice is 
another matter. On the other hand one may use % as a point estimate of §. Then 
we would like to know the variance ji, of Z%, since it is a conventional measure of 
efficiency. In most cases, however, the exact value of jf, is hard to obtain. When 
looking for approximations, a general question that follows naturally is: Can 
the moments of the asymptotic distribution of Z be used as approximations to 
the corresponding moments of Z, and if not, how to find better approximations? 
When the parent distribution is normal, this question has been answered by 
various authors, e.g., Hojo [6], Pearson [8], [9] and more recently Cadwell [3]. It 
has been stated, e.g., in [3], that experiments showed that the distribution of Z 
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tends rapidly to normality, but the ratio j2/jf. tends relatively slowly to 1. Be- 
cause of this slow convergence, approximations were derived for that ratio when 
the sample size is small. While different methods were used by different authors, 
their results agree fairly well with each other. In fact, the problem should be 
considered as completely solved but for the unknown error committed in using 
such approximations. 

One of us [4] recently proved that the distribution of Z, for a normal parent 
distribution, does tend to normality rather “rapidly’’. Here in Section 6, Theorem 
4, we obtain upper and lower bounds for the ratio ji2/jf . These bounds are fairly 
close to each other if the sample size is not too small. It seems therefore that 
even for sample sizes around 20 or so, #2 is not a bad approximation to fi: . It 
becomes a very good approximation if the sample size is large. However, for 
large samples, even “better” approximations are obtained by a different method, 
in (49) and (56) which are also better lower bounds for jf (for all n) than the 
one given in (57). (See Section 6, Remarks 1 and 2) 

Before further discussion, the following notations will be introduced. If f(x) 
and g(x) are functions of x, then £;(g) denotes the expectation of g(x) with 


respect to f(z), i.e., [ g(x)f(x) dx. We use, where f, g, and h are given by (1) 
and (2), a 

ii = E,(zx), fi = E,(x) = &, 

m = E,(x — nn), = FE, (zx), 


and for any integer k = 2, 


(3) 


‘ - \k 4 ~ \k 
i E,(z ae fix) ’ = E,(a — fi) ’ 


(4) 


bk = E,(x — in)’, Mk Ej(x — ui)’. 

It should be pointed out that, although the pdf g(x) of % tends to h(z) as the 
sample size increases, the moments ¢ of Z in general do not necessarily tend to 
i. . In fact sj may never exist [2]. Nevertheless, if the parent pdf satisfies certain 
conditions, then it can be shown that i; and 7 are asymptotically equal (Sec- 
tion 3, Theorem 1). Therefore under such circumstance, it is justifiable, at least 
for large samples, to use —; as an approximation to ji, . 

If the parent distribution satisfies certain conditions, a general method is 
obtained in Section 4 for computing & , k = 1, 2, --- . The method is based on 
the Taylor expansion of x(F), the inverse function of F(x). For example, if 
a(F) — § = )o%_1an(F — 4)” converges for 0 < F < 1, am = 0(2”m*) where 
k 2 O and f(z) is symmetric with respect to z = £, then when n > 2k + 3, 


1 
(5) in~ [ S2C.F"( — FY" AF, 
0 


where C,, is given by (1) and S, = >>%,a,(F — })" (Section 4, Theorem 3). 
Error in such approximations can be bounded, and it tends to 0 as m tends to 
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co. If the parent pdf is not symmetric, similar approximations can be obtained 
(Section 4, (26)). Applications are given to the variances of the sample medians 
of Laplace and Cauchy parent distributions (Section 4, Examples 1 and 2). 

Finally upper and lower bounds are derived in Section 7 for the variance of 
= of a Laplace parent distribution. It then can be seen that for estimating the 
mean of a Laplace distribution, the sample median is a ‘‘better’’ estimate than 
the sample mean, not only for large samples, but for small samples as well. 


3. Large sample moments. 
Lemma 1. Jf 0 < c S }, then for m,n = 1, 2, ---, 
2 


1/2+¢ 4c 
(6) | ju — 3|"u"(1 — u)"du = (3)"""" I um PA. — 6)" dt. 
a 0 


/2—e 


In particular, if c = 4, and C, = (2n + 1)!/n! n!, we have for fixed m, 


1 


| C,\u a 4|"u"(1 BY u)” des O(n-””), 
0 


aiid \oay 1-3 -++ (2m — 1) 
(8) [ Cx -» u'll — uw) du = @) (Qn + 3)Qn +5) --- (an + 2m +1)’ 


1 
[ Cru — 3P""u"(1 — u)” du 
0 


(9) 
= (4g) 1:3-+: Qn + 2m — 1), (m — 1)! 


2-4+-+-(2n + 2m) (2n + 3)(2n + 5) «++ (2n + 2m — 1)" 


These formulae are easily proved using transformations v = +(u — 3), etc. 

THEOREM 1. Let a population be given with edf F(x) and pdf f(x). Suppose that 
the median & of the given population exists uniquely and f(t) ¥ 0, and f'(x) exists 
and is bounded in some neighborhood of x = &. If % is the sample median of a 
sample of size 2n + 1, and jx and j; , as defined by (4), are respectively the k™ 
central moment of % and the corresponding one of its asymptotic distribution, then 
(10) lim fxn = Dorr, 


n-~o 


(11) lim fizx/fex = 1, k =1,2,--:, 
provided that in each case, fix. and jix are finite for at least one n. (The RHS 
(right-hand side) of (10) is of course zero, excepting that f, = &.) 

Proor. We will prove (11) as an illustration of the method we use. (10) can 
be shown in the same way. Obviously 


; 2k-1 2k gh ode iy 
(12) ok = poe + » Se (—1)7* ps” y;, 
j= 


where (7°) = (2k)!/{j\(2k — j)!}. We say that. if jx is finite for a certainn = no, 





596 JOHN T. CHU AND HAROLD HOTELLING 


then fix is finite for all n = no, and 


(13) H2m—1 = O(n™), 


» _ (am\" 1-3 --- (2m — 1) ~m—1!2 
(14) rn = (2) (2n + 3)(2n + 5) +++ (Qn + 2m — p + OM ), 


form = 1,2, --- ,k, 


where a; = 1/f(). On combining (12), (13), and (14), it follows that 


2k 
15) ie = >) siesta me Me 8 3 en 
(15) fix (5 (2n + 3)(2n + 5) --- (Qn + 2k + 1) + O(n ). 


Since fx, = 1-3 --- (2k — 1)a@2, where jf is defined by (2), we have (11). 
To complete the proof, it remains to establish (13) and (14). Now for example, 


tin = [ @ — PCAN" — F@N¥@) dx 


(16) , 5 - 
-[ +] + | =I,+1,+ 1s, say, 


where a < — and b > ~ will be chosen later. For 0 < F < 1, the function 
F(i — F) is nonnegative, reaches its maximum } at F = }, and is increasing 
for 0 < F s 3} and decreasing for } S F S 1. Let 


(17) r = max {4F(a)[l — F(a)], 4F(b)[1 — F(d)}}, 

then 0 < r < 1. Since C, = 0(2°"n"”), it follows that 

(18) I, + I; = O(n"r’). 

On the other hand, if a and b are so chosen that, e.g., F(b) — } = 4 — F(a) =c 


2 
is small, then for } — c S$ F S$ 3} + c, x(F), the inverse function of F(z), is 


uniquely defined and may be expanded, by Taylor’s method, into 
(19) x(F) — § = a(F — 3) + RAF — 3), 


where a; = 1/f(€) and R, is the remainder. Substituting (19) for x — & in J; of 
(16), it can be shown, using Lemma 1, that J; is equal to the RHS of (14). Com- 
bining this fact with (16) and (18), we obtain (14). 

Regarding the above theorem, we make 

Remark 1. A sufficient condition for ~ being finite for some n = mo (hence 
all n = no) is that yu, be finite. This condition, however, is not necessary. For 
example, the variance of the sample median of a Cauchy parent distribution is 
finite if the sample size 2n + 1 = 5, though the variance of the parent distribu- 
tion is infinite (Section 4, Example 2). 

REMARK 2. Theorem 1 states only some sufficient conditions under which (10) 
and (11) are true. For a Laplace parent distribution, f’(¢) does not exist, yet (10) 
and (11) hold (Section 4, Example 1). 
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The above theorem provides a justification, at least for large samples, for 
using 7; as an approximation to ~; . In the next section we will proceed to show 
that if the parent pdf satisfies some additional conditions, then satisfactory ap- 
proximations can be obtained for 4 for samples of smaller sizes as well. 


4. Approximations. 


Lemma 2. If k is real, then the following series is convergent for every positive 
integer n > k, 


(20) > | m* |2(F — 3)|"C, F"(1 — F)” dF. 


Proor. Use Lemma 1 and the fact ([1], p. 33) that if a, = 0 and 


M(Am/Am41 — 1) 


approaches r > 1, then ae dm is convergent; or apply the Stirling’s approxi- 
mation, with m large, to the gamma functions obtained by putting c = 3} in (6). 

THEOREM 2. Let F(x) be the cdf of a given distribution and & and & be respectively 
the median and the sample median of a sample of size 2n + 1. Suppose that x(F), 
the inverse function of F(x), is for 0 < F < 1 uniquely defined and equal to a 
convergent series of powers of F — 4; let 


(21) a(F) -E= dX Gm(F — 3)”. 
Write 
(22) Sn = Do a(F — 4)’, and Rn = > a(F —})’. 


ral r=m+l1 


If there exists a sequence {b,} such that 


(23) > Gch? < ©! 


m=1 


(24) : ba(F — 3)" < @, 

for0 < F <1, and 

(25) Fé [ (F — 3)"C,F"(1 — F)" dF < @, 

for some positive integer value no of n, and fiz , the variance of &, is finite forn = no, 
then for all integersn > no, 


1 


( pl 
26) fe = lim J, S C,F"(1 — F)" dF — (| 


SaC,F"(1 — F)" ar) | ‘ 


Further, if f(x) is symmetric with respect to &, then the second term in the bracket 
should be omitted. 
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Proor. For simplicity we assume that f(x) is symmetric with respect to z = 
In this case i; = £ and 


(27) i» = [ (« — £)° C,[F(z)]" (1 — F(x))" f(x) dx = fe + [ + [ 


_ I; + I; + I; , 
where a < £ and b > é will be chosen later. It can be shown that 
(28) T, + Is = O(n'"r"), 


ene coinen by (17). Choose a, b, and c such that 0 < 4 — F(a) = F(b) — 


$= 3. Using (21) and (22), we have 


(29) ih < [ S.C,.F\1 — F)"dF|SAat+atats, 
0 


where 


1 
(30) [ S2.C,F"(1 — F)" dF, 
+e 


nt—c 
(31) . | S2.C,F"(1 — F)" dF, 
Jo 
i+e 


(32) Js R..C,F"(1 — F)" dF, 


j—c 


i+e 
(33) Je= [> 2 SaRm| CaP" — FY" aF. 
By Schwarz’s inequality, we get 


(34) h+hsog-d>(* =) 3h [ (F — 3)"C,.F""(1 — F)"" dF. 


m==1 


By (23) and (25), the two series on the RHS are convergent for n > no. Hence 
ifn > no, Ji + Je tends to 0 as c tends to 3. 
Further, from (32), 


x 2 oO 1 
(35) As dD (*) A [ bi(F — 4)"C,F™(1 — F)" dF, 
r=m+1 © 


r=m+1 /0 


oo 2 2 . 
(36) Js 2[(*). be (¢ )]: 0 [ (F — 4)"C,F"(1 — F)" dF. 
r=l b, r=m+1 b, r=l 
As m tends to infinity, J; + J tends to 0. Consequently for any fixed n > no , 


(37) a im | S2.C,F"(1 — F)" dF. 


ma 


An immediate consequence of Lemma 2 and Theorem 2 is 
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TuEoreM 3. If, in the preceding theorem, dm = 0(2"m*) for some integer k, then 
(26) holds for every n > 2k + 3. 
Proor. Choose bm = 2”m**", 


Thus we have found an approximation for 
1 
(38) in~ [ SRCAF"G — FY" aF, 
0 


for all integers n which are not too small. The integral on the RHS of (38) can 
be evaluated by formulas given in Lemma 1. An upper bound for the error com- 
mitted in such approximation is given by the sum of the RHS of (35) and (36). 
Finally we note that the same method can be used to obtain the moments of Z in 
general. 

Exampte 1. Laplace distribution. Let f(z) = }e7'*', then F(x) = 1 — 4¢7 
if x = 0, and F(x) = 4e* if x < 0. Hence 


1 
(39) fg = 2 I 2 C,F"(1 — F)" dF. 


If4 < F <1, thenz = —log2(1 — F) = > 83 m™ [2(F — 4)]". Soa, = 2". 
It follows that for n > 1, 


(40) fi, = lim 2 j {x r2(F — pr} C,F"(1 — F)" dF 


© 1 
(41) > wv. | \2(0F — 4) |" c, FX — F)"aF, 
m=) 0 
where ({1], p. 84), 
(42) Wm = or i(m — r+ 1)7 = (m+ 17 > r. 
r=l r=] 
If we use 
2k—1 1 
(43) ia~ Dw | [20F — 9) ("C.F — FY" aF, 
m=1 “0 
then the error committed is bounded by 


1/2 1 \-12 2:4--- 2k = 
(90) 20a (1 +9, 0+" eo E a Oe FIEND 


In deriving (43a), we used the facts that w,, is a monotonically decreasing se- 
quence of m ({1], p. 85) and the Wallis product ((7], p. 385) 


1-3 --+ (Qn — 1) nl? — go? 


2-4 --- (2n) 


is a monotonically increasing sequence of n. Similarly if we use 


2k al 
(44) he ten 2 | \20F — 3) I"" Cc, FP" — F)" dF, 
man) ? 
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then the error is bounded by 


1 1-3 --- (k+ 1) 

44 ' sere teeead sear es 
tte) Zwdoe1 (1 + i) (2n + 1)(2n + 3) --+ (Qn + 2k + 1)’ 
Examp.e 2. Cauchy distribution. Let f(z) = 1/x(1 + 2”), then 

F(x) = w [tan z + 2/2], 


for —«o <2 < ~,s02(F) = tan r(F — 3) for 0 < F < 1. It can be shown 
that the variance of the sample median of a sample of size 2n + 1 2 5 is finite: 





rl 
(45) jo = | tan’ x(F — 3) C,F'(1 — F)" dF. 
“0 
It is known ({7], pp. 204, 237) that 
2 om o2m i ; 
(46) tan t= _ (—1)"" = @ . 1) Bom - for | zx | < - ’ 
m= (2m) 3 2 
where 


B, = XI Se Se, 


(24)*™ oe 
We see that a, = 0(2”), hence by Theorem 3, (37) holds if n > 3. 


5. Normal distribution. Throughout this section, f(x) = (2r)~e*"” and 
F(x) = ft. f(t) dt, and 2x(F) is the inverse function of F(z). No simple general 
form of the derivatives of z(F) at F = 4 is known. But the first few derivatives 
of x(F) can be obtained by direct differentiation, e.g., 


de _ il de_ i £¢ bs) ft 
dF f(z)’dF? f(x) dx \f(z)/’ 


For finite m and 0 < F < 1, let 
(47) 2x(F) = a(F — 3) + a(F — 3) + +++ + am(F — 3)" + Rnus(F — 3)", 


then 





a2=a%=---=0, 
(48) a, = (2m)"", ay = (24)°"/3!, as = 7(2e)°/5!, «++, 
5/2 
Ry = 22) (07 + 462° + 242Ve**"Ip,, <>, 





5! 


where [g(z)|r, = giz(Fo)], Fo = 3 + O(F — 3) and0 <6 S11. 

A. A lower bound. Take the integral (39), let the raage of integration be 
divided into two: 4 to 4 + c, and 4 + ¢ to 1, where 0 < c < }. If we neglect 
the last integral; in the first integral, replace z(F) by its expansion (47) with 
m = 6, and then neglect all terms containing the remainder term R; (which is 
non-negative), and finally let c approach } and use Lemma 1, we are able to 
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obtain a lower bound for the variance jf of % of a normal parent distribution 
with unit variance, i.e., 


‘ T 130° 
(49) 2 = ds [1 + 90n +5) | MQn + 5) On + 5h 
where 
(50) Az = a/2(2n + 3). 


Incidentally, for n = 3, we may expand the RHS of (49) into a power series in 
(2n + 1)” and obtain an approximation for 


(51) fe ~ m1 — (2-2) n+ 1) — Ge —4— 1344/2024) + --- | 
where fz is given by (2). In terms of standard deviations, and with the numerical 
values of the coefficients computed, (51) is equivalent to 

(52) a2” ~ a1 — (.2146)(2n + 1)* — (.0806)(2n + 1)* + --°]. 


This agrees with a formula obtained by Pearson for the same purpose ((8], p. 363). 
B. Approximations for large samples. 
af +2f =n+h, 
0 a 


say. Since F(x)[1 — F(x)| S F(a)[l — F@]ifz =a 2 OandC, S (27) "1 + 
(2n)~*|(2n + 1)"72°"** it follows that 


ll 


(58) m= 2) xCP()I" — F@NM2) dx 


(54) I/de S (2/n)*?(1 +3/2n)(2n + 1)°"[4F(a)(1 — F(a)))” a/ a*(2n) te" de. 
In J,, use F as the independent variable, and replace z(F) by ai(F — 4) + 


a,(F — 3)° + M,(F — 3)° where M; = MAXosz<a Rs = (2r)°"A/5! and A = 
(7 + 46a” + 24a')e”, then 


~ r\' (2A 1 3-5 
ni S) T sew * (5) Gi 7 3) @n + 5)Qn +7) 


r\ 2A 3-8-7 
- m (5) 315! (2n + 5)(2n + 7)(2n + 9) 


2 5!/ (2n + 5) -++ (2n + 11)" 


Combining (49), (53), (54), and (55), we conclude: 





- ° . . i Stee Le 
fi2 ~ First Approximation: ), 2n + 3)" 
(56) 
. . T 
Second Approximation: \, E + 3(2n + 5) | 
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TABLE 1 
Proportional error 





Sample Size First approximation Second approximation 
501 3.2 X 10-3 | 6.8 X 10-5 
201 8.5 X 10-3 7.8 x 10- 
101 | 2.2 xX 107° | 6.9 X 10-3 
51 9.2 x 107 6.3 X 107 


If the second approximation is used, an upper bound for the proportional error 
(defined to be | (True value / Approximation) — 1 |) is given by the sum of the 
RHS of (54) and the last three terms of that of (55). If the first approximation is 
used, then there is an additional error +/2(2n + 5), the second term in the 
bracket of the second approximation. 

Table 1 is given for illustration. We choose successively for a: .35, .50, .65 
.75. It is to be noted that the RHS of (54) is a decreasing function of n for, e.g., 
n 2 25, a = .75. The RHS of (55) is obviously also a decreasing function of n. 
Therefore what Table 1 means is that: e.g., for sample sizes > 51 (not just =51), 
if the first approximation is used, then the proportional error is <.092, or ex- 
plicitly: 1 S ji2/rA2. S 1.092. 


6. Normal distribution—a different approach. In this section a different 
method is used to derive upper and lower bounds for the variance of 7 of a nor- 
mal parent distribution with unit variance. We state 

THEorEM 4. Let and jiz be respectively the sample median and its variance of a 
sample of size 2n +- 1 drawn from a normal distribution with unit variance. If 
f2 = 2/2(2n + 1) is the variance of the asymptotic distribution of %, then 

1 2 1 \2 
(57) B, € - = ts) S p/p. S B, (1 + 2) , 
where B, = C,(4)°"*(24)"?/(2n + 1)", and C, = (2n + 1)!/(n! n!). Further, 
for all practical purposes and n = 4, 


o) “ae 
8n 24n?(2n + 1) 





1 


sili 16n(8n — 1)’ 


<n 41424 
8n 
or 
5 . on’ 
Proor. By using the following transformations consecutively, 
(60) u = Fiy), v=u-— 3, 
where F(y) = ft. (Qa) "e-*"? de. we obtain 


1/2 
61) fa = 20,0)" [ya — 40°)" de. 
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Let 

(62) v= 4(1 a eg MIGa4 us 

then 

(63) jig = 2B, | (Qu) ryre OD lemtDp (4/(2n + 1)"”) dt, 


where A,(t) = i(1 — yore = 1 for allt = O. Further, it is known [10] that 
v 
(64) I (24) 7—-7* de < (1 ae eet isy us, 
0 
Using (64), it can be shown that y* = zi”. Therefore it follows from (63) that 
’ 1 3/2 . 
(65) fig = B,, (1 = _— Me. 


On the other hand, we have from (63), 


(66) fis = 2Bae I (ar) Pen (2 /m)y7ha(t/(2n + 1)") dt}, 


where ha(t) = e /t(1 — e& 
for allt = 0, then 


1 \*2 
(67) fo S B, (1 + 2) ie. 


2n 


«2 


)*. If we can show that (2/x)y*ha(t/(2n + 1)*”) <1 


Now yha(t/(2n + 1)"") < go(y) where 
(68) go(y) = y\(1 — 4v°)/40°. 


It can be seen that lim,.ogo(y) = 2/2. Hence it suffices for our purpose to show 
that go(y) is decreasing. Let a prime denote differentiation with respect to y. Then, 


(69) go(y) = (y/2v*)g:(y), 
where 
(70) gly) = v(1 — 40°) — w’. 
gi(y) = gay)’, where go(y) = y° — 120°, 
(71) g(y) = (12/r)e'ga(y), where 


v 
gs(y) - (x/6)ye” = en f se 2 ie. 
0 
It is known [10] that 


t v 9 = 
(72) etn f en de = do y™"/1-3--- (Qn + 1). 
0 


n= 
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Hence gs(y) = > 0 {x/6n! — 1/1-3 --- (2n + 1)}7°"**. It can be shown, by 
a similar argument used in [10] for a similar purpose, that 


(73) gs(y) = y*{acy? + ar + ay’ + ---], 


where ap < 0 and a; > 0,% = 1, 2, --- . Hence there exists a yp > 0 such that 
gx(y) SO if OS y S yo and g;(y) = 0 if y =} yo. Soas y increases from 0 to ~, 
g2(y) decreases steadily from 0 to a minimum and then increases steadily to ~. 
Consequently g:(y) first decreases steadily and then increases steadily. As 


limy.ogi(y) = lim,+.gi(y) = 0, 


it becomes clear that g:i(y) < 0 for all y = 0. Therefore go(y) is a decreasing 
function of y. This completes the proof. 
Finally we note that (58) is obtained by using n! ~ (2x)'?n"*M%e-" tm 
Remark 1. The lower bound for j given by (49) is better than the one given 
by (57) if we use (59) for B, . This is so even if the last term at the RHS of (49) 
is omitted. For 


2 
T TT 


2(2n + 3) + 4(2n + 3)(2n + 5) 


(74) 
ii E = (8 — 2r)n + 20 =] 
2(2n + 3)(2n + 5) J 


~ 2@2n + 1) 
Now if n = 2, the last term in the bracket of (74) is smaller than (2n + 2)” 
and (1 + 1/8n)[1 — 1/(2n + 2)]'* < 1. Therefore the quantity in the bracket 
of (74) is greater than 


1 1 cee 
ttt anisuints ae Pe _— askdateiinaals 
; mpe(i+ ey ——e 


For n = 1, direct comparison shows also that (74) is greater than the LHS of 
(57). 

ReMARK 2. Since the upper bound (57) for jz is greater than @.(1 + 1/2n), we 
cannot be sure that in using @ as an approximation to j2 , the proportional error 
is less than 1/2n. But if the second approximation given by (56) is used, the 
proportional error is much smaller than 1/2n for large samples (Table 1). One 
may say that (56) is a “better” approximation for large samples. 


7. Laplace distribution. We shall now employ the same technique, used in 
Section 6, to derive upper and lower bounds for the variance 2 of the sample 
median Z of a sample of size 2n + 1 drawn from a Laplace distribution with pdf 


(75) f(z) = 46". 


Clearly, the variance in this case of the asymptotic distribution of 7 is 


1 
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We state 
THEOREM 5. If ji2 and fz are as defined above, then 


1 3/2 1 3/2 
Stk ith atti . . 5 atten 
(77) B.(1 xk ) < in/a 151B, (1 +3) ; 


where B,, is given by (57) and (59). 
Proor. It can be seen easily that jf: is equal to the RHS of (63) if v and ¢ 
satisfy (62) and 


v 

(78) v= [ 46° dz. 
0 

We proved [4] that for all y = 0, 

(79) vs3— eo). 


From (62) and (79), we have y’ = fit’. Hence it follows from (63) that j2/f 
has a lower bound given by (77). 
Further, from (63) 


(80) jin = 2Balle [ (Qu) Ue) yPha(t/(2n + 1)"*) dt, 
0 


where h,(t) is given by (66). We say that 
(81) yhe(t/(2n + 1)"") S 1.51. 


If this is true, then the RHS of (77) is an upper bound of f2/f2 . Thus the proof 
is completed. 


To establish (81), we introduce, as in (68), 
(82) go(y) = y(1 — 40°)/40°, 


where y and v satisfy (78). For all y 2 0, go(y) is not smaller than the LHS of 
(81) and 


' 1 0 
(83) lim go(y) = 9 8 YQ: 


Let a prime denote differentiation with respect to y, then 


(84) go(y) = 5 nu), 

where 

(85) gly) = v — 4v° — dye. 

(86) gi(y) = 46 “goly), 

where 

(87) ga(y) = —12v* + y. 

(88) gs(y) = —12ve" + 1 = gs(y). 
(89) gx(y) = 12e°%(4 — 7”). 
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If f(x) is a function of 2, and if as x increases from 0 to «, f(x) varies from, e.g., 
positive to negative, and then back to positive, we will write, for simplicity, as 
z:0— ~, f(x): +, —, +. Now 


(90) 93(y) S 0 according as y = log 2, 


and g;(0) = g;(~) = 1, and g(log 2) = —}. So as y:0 — ~, g;(y): +, —, +. 
Now g2(0) = 0, while g.(%0) = ©. We say that as y:0 — ~, go(y): +, —, +. 
Otherwise g2(y) = 0 for all y = 0, so gi(y) = 0 and g,(y) = 0 for all y = 0, as 
gi(0) =.0. Hence go(y) is steadily increasing. This, however, contradicts (83). 
It follows that as y:0 — «, gi(y): +, —, +. Now g:(0) = gi(@) = 0, hence as 
y:0— ~, 9,(y): +, —. Therefore we conclude that as y:0 — ©, go(y) increases 
steadily from 1 to a maximum, and then decreases steadily to 0. To find the 
maximum of go(y), we first solve gi(y) = 0, which is equivalent to 2v(1 + 2v) — 
y = 0. Using table [12], we obtain an approximate solution y = 1.15. The 
maximum of go(y) is then found to be 1.51. 

Remark. The variance of the sample mean (of a sample of size 2n + 1) drawn 
from a Laplace distribution with pdf given by (75) is 2/(2n + 1). It follows, 
from Theorem 5, that the sample median has smaller variance than the sample 
mean for sample size 2n + 1 = 7. In a recent paper, Sarhan [11] found that for 
sample sizes equal to 2, 3, 4, and 5, the variance of the sample median is also 
smaller than that of the sample mean. 
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MAXIMUM LIKELIHOOD ESTIMATES OF MONOTONE PARAMETERS' 
By H. D. Brunk 
University of Missouri 


1. Summary. The maximum likelihood estimators of distribution parameters 
subject to certain order relations are determined for simultaneous sampling from 
a number of populations, when (7) the order relations may be specified by re- 
garding the distribution parameters, of which one is associated with each popula- 
tion, as values at specified points of a function of n variables (n a positive integer), 
monotone in each variable separately; (ii) the distributions of the populations 
from which sample values are taken belong to the exponential family defined 
below. This family includes, in particular, the binomial, the normal with fixed 
standard deviation and variable mean, the normal with fixed mean and variable 
standard deviation, and the Poisson distributions. 


2. Monotone parameters, and the exponential family of distributions. Let 
m, n be positive integers. Let ¢ denote an n-tuple, t = (t’, (, --- , ¢"), of real 
numbers, and let ¢,(k = 1, 2, --- , m) denote one member of a set Z£ consisting 
of m such n-tuples. Let there be m populations from which sample values are to 
be taken, and let the distribution of the k-th be completely specified by the 
knowledge of a single distribution parameter 6,(k = 1, 2,---, m). Let the 
parameters 6, be known to satisfy the following monotonicity condition: there 
is a real-valued function 6(¢), monotone non-decreasing in each of the separate 
variables ¢'(¢ = 1, 2,--- , n), such that 0 = 0(%), k = 1, 2,---, m. (If @(t) 
should be monotone non-increasing in some or all of the variables ¢’, a change in 
direction of the corresponding axis in the n-space ®, would serve to render 
6(¢) non-decreasing in each variable.) 

Let each population be a population of values of a stochastic variable, x, 
such that there exist a finite or infinite interval 7, a number c « J, a proba- 
bility measure v on the Borel sets of the real line, a Baire function g(x), and a 
strictly decreasing, continuous function r(v) for v ¢ J such that the density func- 
tion of x with respect to v is given by exp{—F[g(x), 6|} for some @ ¢ J, where 


(1) F(u,0) = [w= 2) dele) = 0) + ule) — +00), 


and where 


Ad initia [ zane. 
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Then if A is a Borel set on the real line, 
Pr {xe A} = / exp {—Flg(x), 4]} dv. 
A 


Let the distributions of the stochastic variables associated with the m popula- 
tions be obtained by holding J, c, v, g, and 7+ fixed, and letting @ take on the 
values 6, 02, +++, Om. 

As particular instances of such distributions, one obtains the normal distri- 
bution with mean 6, variance 1, on setting J = (—«, ~), c = 0, g(x) = 72, 
r(v) = —v, dv = exp (—2"/2)/+/2x dz; the binomial with parameter 6 on setting 
I = (0, 1), ec = 4, gz) = 2, r(v) = log [(1 — v)/v], v the measure assigning 
measure } to each of the two points 0 and 1; the normal with mean 0, variance 
@ on setting J = (0, ©), c = 1, g(x) = x’, r(v) = (1 — v)/2v, dv = exp (—2°/2)/ 
»/2x dx; and the Poisson with mean @ on setting J = [0, ©), c = 1, g(x) = 2, 
r(v) = —log v, v the measure assigning measure e ‘/r! to each non-negative 
integer 2. 

The distributions described above belong to the exponential family ((5], p. 
65). It is known that in sampling from a single population whose distribution 
depends on a single parameter, under conditions implying sufficient regularity, 
only distributions having density functions of the form 


exp {—[y(@) + g(x)7(@)]} 


with respect to a measure admit sufficient estimators ((6], [7]) and efficient esti- 
mators (as defined in [4], pp. 479-481), and that the maximum likelihood esti- 
mator is sufficient and efficient. It is further clear that, given sufficient regularity, 
if the density function is of the form exp {—[x(p) + g(x)¥(p)]} then the change 
of parameter 6 = —x’(p)/W’(p) yields a density function of the form 


exp {—[y(8) + g(x)r(6)]} = exp {—Flg(z), 6}}, 
where y’(@) = —6r’(@), hence 


Flg(z), 6)| = I Ig(z) — 2] dr(2) 


(except perhaps for an additive function of z which may be absorbed in the 
measure v). 

The population whose distribution is determined by the distribution parameter 
6. = 6(t) will be referred to as the population at t,.(k = 1, 2, --- , m). Let there 
be drawn from the population at ¢, , N(k) sample values, ry , Xx, -** , Tw ,k - 
The sample stochastic variable x, has the density function exp {—F[g(x), &]} 
with respect to the measure v, and for distinct 7 the stochastic variables x ,, 
are independent (j = 1, 2, --- , N(k);k = 1,2,---,m). 

The problem is to determine the maximum likelihood estimators of the 
parameters 6, --+ , 4m, subject to the monotonicity requirement. 

If A is a Borel set in the sample space of points (ry, , x, -** , Cw@,e), then 
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N(k) 


( 
Pr {(xu, °°: vax) € A} = | exp} - d Flg(zj), aah d(vX--- Xv), 


where v X --- X vdenotes the direct product of the N(k) measures v. From (1) 
it then follows that 


Pr {(xu,--+ , Xwa,e) € A} 
N(k) 


= | exp { —N(k)F | g(ajx)/N(k), a} div X -++ XX»). 
A \ i= 4 y 
Similarly, if A is a Borel set in the sample space of points 


(tu » °° ¢ ate do . ts Bae” ** IN(m),m); 


then 


Pr { (xu a? 5 MASS 9? *- 5 lis 19 fs Xy(m),m) é A} 


( m N (k) 
. / exp | — dD N(k)F | g(xjx)/N (k), a | dv X--- X»), 
A k=l j=l } 

where the integration is with respect to the direct product of the }°7, N(k) 
measures pv. 

The numbers 2; , as well as the functions g, F and the numbers NV (k), are to 
be regarded as given; the maximum likelihood estimator of the parameters 
6, = O(t%), subject to the monotonicity requirement that 6(¢) be non-decreasing 
in each t' (¢ = 1, 2, --- , n), is the set of m numbers for which the exponential in 
the integrand assumes its maximum value in the class of all sets (@,, --- , Om) 
satisfying the monotonicity requirement. The same set will also minimize the 
negative logarithm of the exponential : 


2) > N(&)Flae , 6, 


k=l 


N (k) 


where # = jai g(tjx)/N(k) (k = 1, 2, --+ , m). This sum can be rewritten 
in the form 


(3) | FO, 00) du, 


where the measure u assigns the measure N(k) to the point t(k = 1, 2, --- , m) 
and where 9(t:) = jx = 2}? g(xa)/N(k). The problem of determining a func- 
tion Q(t) minimizing, for given g(t), the integral (3) in the class of functions 
6(t) monotone non-decreasing in each ¢'(i = 1, 2, --- , n) is a special instance 
of the problem discussed in [3]. The problem discussed in [1] is again a special 
instance of the problem of the present paper, obtained on setting n = 1 and 
supposing that each population has a binomial distribution. 

The results of the present paper as regards existence, representations, and 
uniqueness of the maximum likelihood estimators are contained in correspond- 
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ing results obtained by Ewing, Reid, Utz, and Brunk [2], [3]. The purpose of the 
present paper is to point out the application of those results to the sampling 
situation described above. Further, the fact that in the problem of this paper the 
measure yu has all its mass concentrated in a finite number of points makes it 
possible to simplify very considerably, for this situation, the proofs of the results 
mentioned above. In the interest of completeness these simplified proofs are 
sketched in Section 4. 


3. Example: a problem in perception. The following example, in which n = 2 
and the populations have binomial distributions, has been suggested by G. M. 
Ewing. Given a population of people, to be termed observers, let @(¢) = @(t’, ¢’) 
denote the probability that an observer O will fail to see a certain object which is 
idealized into a line segment, when the distance OM from O to the center, M, of 
the segment is ¢’, and the orientation of the segment is specified by the angle 
, 0 < ¢ S x/2, from the perpendicular to OM to the line segment. Imagine a 
series of experiments in which ¢' and @ are controlled and in which corresponding 
to each pair t; = (t, ti), k = 1, 2, ---, m, interpreted as a point in the ¢’, ¢ 
plane, N(k) observers independently report whether they do or do not see the 
object. Let g(t.) denote the fraction of the total number, N(k), of observers who 
record not seeing the segment at distance ¢; with orientation ¢. Let 
xj(j = 1,2, --- , N(k); k = 1, 2, --- , m) denote the stochastic variable which 
assumes the value 1 if the j-th observer at ¢, fails to see the object, otherwise the 
value 0; then 


(ts) = [> a | /N(), 


It seems reasonable to suppose that 6(¢) is monotone non-decreasing in ¢ 
and ¢° separately. 

If x denotes the number of successes in a single trial of an event with proba- 
bility @, then its probability function is 6°(1 — 6)"*(z = 0, 1). In the present 
notation, Pr {x eA} = f,(2 @)*[2(1 — 6)]'* dv, where » attributes measure } to 
each of the points 0 and 1. The distribution of x is the binomial distribution. 


4. Existence, representations, and uniqueness of maximum likelihood esti- 
mators. For points ¢, * in &, , write ¢ < ¢* if no coordinate of ¢ is greater than 
the corresponding coordinate of ¢*, and t < < ¢* if each coordinate of t is less than 
the corresponding coordinate of ‘*. The monotonicity condition on the pa- 
rameters 6,,--- , 4m can then be described as requiring that there exist a real- 
valued function 6(t) defined for t ¢ ®, such that 6, = @(t,) and such that @(t) < 
6(t*) whenever t < ¢*. 

The set £ of points ¢, corresponding to populations from which sample values 
are taken is a finite subset of n-space, ®, . Let E now be regarded as a funda- 
mental space of points ¢. 

If t ¢ E, determine k so that t = & and set Q(t) = N(k), so that Q(#) is the 
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number of sample values taken from the population at ¢. For an arbitrary func- 
tion 6(t) defined on £, let 


(4) J[9()] = >) QWFIg®), O(0) 


where the summation is over all ¢ e Z, and where 


N(k) 


(5) g = 2 g(ax) /N(k) if t=. 


The problem is to minimize J(@(¢)] in the class mM of functions 6(¢) such that 
6(t) S 06(t*) whenever ¢t < @*. 

It is clear from (1) that for fixed u, F(u, v) is strictly decreasing as a function of 
v for v < u, and strictly increasing for v > wu. It will be supposed that the given 
function 9(t), defined for ¢ ¢ Z, is such that F[g(t), 9(@)] > — © for t e EZ, from 
which it follows that F[g(t), 0()] > —« for te #, for every function @(¢). 
Since F[g(t), c] = 0, and since the function 6(¢) = c is in MM, the infimum of 
J(@(t)| for @(t) e NM is finite (and non-positive). 

For a function @(¢) defined on E, set & = 6(t.), & ¢ £. It is a consequence of the 
above property of F that the search for numbers 6,, --- , 6, such that @(¢) is 
in 9% and minimizes J[@(t)] in 91% can be restricted to the closed, bounded m- 
dimensional interval. 

min #% S 6; S max j%, j = 1,2,---,m, 
isksm igksm 
where # = 9(t.), k = 1, 2, --- , m. The problem thus is to choose (@, , --+ , Om) 
in that closed part of this interval determined by the monotonicity requirement, 
so as to minimize the sum in (2). The continuity of this sum in (6, --- , 4m) 
assures the existence of the minimum. 

The following definitions represent slight modifications of corresponding defi- 
nitions introduced in [3]. Each point of E determines a lower interval, the set of all 
points ¢* in EF such that ¢* < ¢, and an upper interval, the set of all points ¢* in 
E such that ¢* > ¢. A union of lower intervals is a lower layer; a lower layer L 
is characterized by the property: if ¢ is in Z and ¢* is not in L then it is not the 
case that ¢ > ¢*. A union of upper intervals is an upper layer, admitting an 
analogous characterization. The complement in F of a lower layer is an upper 
layer, and conversely. The union and intersection of two lower (upper) layers 
is a lower (upper) layer. If A C E, A denotes the complement in E of A. 
If R, S are lower layers, then the set R n S is called a layer, and is denoted by 
{R, S}. We note that if @(¢) is monotone non-decreasing in each of the variables 
t', ?,---, ¢", and if a is real, then the subset of E on which @(t) < a is a lower 
layer, the subset of Z on which 6(¢) > ais an upper layer, and the subset of £ 
on which 6(¢) = a is a layer. 

If A C £, let 


(6) * J[@(t); A] = x QOFII®, oI, 
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and, if A is not empty (A ¥ ¢;¢ is the void set), set 
(7) M(A) = 2, Q9 / DQ. 


Thus M(A) is a weighted mean of the numbers g(t) at points ¢ ¢ A. 

We note the following properties of the function F, consequences of the 
definition, (1). 

If AC E, A # ¢, and if 6 is a fixed number in J, then 


(8) 2d )F{g@), 6] = [D Q())F{M (A), 4). 


(9) For fixed u ¢ I, F(u, v) as a function of v is strictly decreasing for v < u 
and strictly increasing for v > u. 
It follows from (8) and (9) that 


(10) min Y Q@FI9®, 6) = [> Q@IFIM(A), M(A)], 


that is, the minimum of the left member of (8) is attained for 6 = M(A). 

Let O(t) denote the maximum likelihood estimate, the function of the class 
ym minimizing J. Let 6° < @ <--- < 6 bether a values @(t) assumes, 
and let E‘ denote the subset of Z on which @(t) = 6°(i = 1, 2, --- ,r). 

Lemma 1. M(E') = 6° (i = 1, 2, ---, r). We have 6° => M(E’), for it is clear 
from (8)—(10) that if @° < M(E") then it would be possible to decrease J by in- 
creasing @(t) on E‘ while preserving the monotonicity of @(t). A similar remark 
shows that 6° < M(E’). 

Let S* denote that subset of HZ (a lower layer) on which O(¢) < 6", and set 
S° = ¢ (the void set). We have E‘ = {S**, S‘} c S‘(i = 1,2, --- ,7r). 

Lemma 2. If {R, S'} ¥ ¢ then M{R, S‘} < M(E"). If {S*", S} ¥ ¢, then 
M{S*", S} = M(E‘)(i = 1,2, ---,71). 

For if M{R, S'‘} were greater than M(E’), it would be possible to decrease J by 
increasing @(t) on {R, S*} while preserving the monotonicity of @(¢). A similar 
remark establishes the second statement. 

THEOREM 1. For to) ¢ E we have 
(11) @(s) = max min M{R, S}, 


R_ 8;toe{ BS} 


(12) O(m) = min max M{R, S}, 


S Ritoe{R,S} 


(13) A(t) max min M{R, S}, 


RjtoeR 8 


(14) O (to) min max M{R, 8S}. 


S;tge8 
Proor. Determine i so that tp ¢ E’; then O(t) = 6°. We have 
min M{R,S} < M{R, 8‘} 


8;t9e{R,8} a, 
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if to ¢ R, therefore 


max min M{R,S} < max M{R, S‘} = M(E‘) = 6 


R Sjtoe{ RS} RitoeR 


by Lemmas 2 and 1. Also 
max min M{R,S}= min M{S*",S} = M(B’) = 6 


R_ S;toe{R,8} 8;toe{S*—1,s} 
by Lemmas 2 and 1. The two inequalities yield (11), and the proofs of (12)-(14) 
are similar. (The above proof of Theorem 1 through Lemmas i and 2 is a further 
simplified proof suggested by the referee; it provides an alternative proof of 
Theorem 2.2 in the preceding paper.) 

Since any of the formulas in Theorem 1 determines the value of a minimizing 
function O(¢) at an arbitrary point of R, the minimizing function is unique on L. 


5. Calculation of minimizing function. , 
Lemma 3. If S 3 S*, {S', S} ¥ ¢, then M{S™, S} > M{S*™, S*} = M(E‘) = 
0; if RC S*", {R, S*"} # ¢, then 


M{R, S‘} < M(E') = 6 (¢ = 1,2, ---,7r). 


For by Lemma2 M{S*", S} = M(£'); if M{S*", S} = M(2") then J could 
be decreased by setting @(t) = M(E‘) = 6‘ on {S*", S}. The proof of the second 
statement is similar, 

It follows from Lemmas 2 and 3 that E’ = S' is the maximal lower layer of 
minimal mean: that is, the union of those lower layers over which the mean is 
the minimum assumed by the means of all lower layers. The layers 
E'(i = 2,3, --- , r) may be determined successively by the use of the criterion: 
E' = {S**, S"} is the mazimal layer of minimal mean among layers {S**, S} 
(i = 2,3, ---, r). Similarly, Z” = {S”™*, S’} = {S"", EB} is the maximal upper 
layer of maximal mean, and the layers E‘(i = r — 1, r — 2, ---, 1) may be 
determined successively by the use of the criterion: Z' = {S*", S*} is the 
maximal layer of maximum mean among layers {R, S‘}. 

For the one-dimensional case (n = 1), the method of calculation discussed in 
[1] is also available. 


6. Consistency. In order to discuss the consistency of the maximum likelihood 
estimator, let again ®, be the fundamental space over which ¢ ranges, and let 
the distribution parameter @ = @(¢) be defined (though unknown to the investi- 
gator) on an open subset © of ®, , and be monotone non-decreasing in each of 
the variables ¢‘(i = 1, 2, --- , m). We shall suppose that the set EZ of points 
i(k = 1, 2, «++, m) corresponding to populations from which sample values 
are to be taken is in ©. For t ¢ EZ, let O(¢) denote, as in Sections 4 and 5, the 
maximum likelihood estimator of @(t). Let @(t) also denote an extension to 0 
of 6(t) defined on E which preserves the monotonicity property; for the sake of 
definiteness, let O(t) = max O(é,) for & ¢ FE, t, < t. For fixed ¢, the value of 
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@(¢) depends on the sample values from populations at points of 2; accordingly 
this value is a stochastic variable, which we denote by @(¢). 

The content of the consistency theorem for the one-dimensional case (n = 1) 
may be stated roughly as follows: at a point, f) , of continuity of @(t), the esti- 
mator @(é) will with high probability be near @(f), provided enough sample 
values are taken at points ¢ near é& such that t < < é , and enough sample values 
are taken at points ¢ near f) such that ¢ > > &. This consistency theorem gen- 
eralizes that given in [1] fe- the binomial case. 

As in Section 2 we suppose that x is a random variable having a density func- 
tion exp {—F[g(x), 6]} with respect to the measure v, independent of @, on the 
class of Borel sets, where g(x) is a Baire function. The density function of the 
stochastic variable y = g(x) with respect to the induced measure x = vg” is 
exp {—F(y, 6)}; for a Borel set B on the y axis, 


Pr {y eB} = [exp (—F(y,0)} dx 


((6], p. 163). In particular, 


[exp {-uir(®) — r(0)} dx = exp vO), 


where y(@) = —f%zdr(z). With the change of parameter + = 7(8), this integral 
becomes a bilateral Laplace transform, representing an analytic function of + 
in its strip of convergence ([9], p. 240; [5], p. 67). The above hypotheses on x 
in effect imply the convergence of the integral for 6 J and hence the analyticity 
of 7(@) and 7(@) on the interior of J. Straightforward calculations then yield 


(15) E(y) = 8@, Var y = —1/r’(0) 

Let —1/7’[6(t)] be bounded: 
(16) —1/7'lo()) < C fort ¢ 0. 
For the binomial distribution (—1/r’(v) = v(1 — v), 0 S v S 1) and the normal 
with mean @ and standard deviation 1(—1/7r’ = 1) for example, this is no re- 
striction. For the Poisson (—1/r’(v) = v, 0 < v < @&) and the normal with 


mean 0, variance @ (—1/r’(v) = 2v*), (16) requires that 6(¢) be bounded in ©. 
For given « > 0 and 7 > 0, let K = K(e, n) be a positive integer such that 


> 1/? + 1/4K < &*n/1280. 
v=K 


THEOREM 2. Let n = 1. Let the above conditions on the family of populations and 
on the function 6(t) be satisfied. Let ty be a continuity point in © of 6(t). Let t’, t” 
be so chosen that t' < to < t” and that |@(t) — O(to)| < «/2 fort’ St S t”. Then 

Pr {|@(%) — A(t)| < e} >1-— 9 


provided that at least K = K(e, n) sample values are taken from populations at 
points in {t’, to] and at least K from populations at points in [to , t”). 
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The proof requires only minor modifications in the proof of the special case in 
[1], and is omitted. 

It is not difficult to see that for n > 1 the maximum likelihood estimator does 
not have precisely the consistency property which is the obvious analog of the 
consistency for n = 1. It is not true that, provided enough sample values are 
taken from populations at points near f& and “above” é, and enough from 
populations at points near f and “below” ft) , the estimator @(/)) will assume a 
value near @(4) with high probability. This can be illustrated with an example 
in which the populations are binomial: each of the sample stochastic variables 
yu(j = 1, 2, --- , N(k)) corresponding to a population at ¢, assumes the value 1 
with probability @(t.) = 4 and the value 0 with probability 1 — 6(t,) = 3; 
let A(t) = 3. Let t& be a point of ©. Let E be an arbitrary finite set of points 
in 0, and let the total number of sample values to be taken from populations 
at points of E be H. Let Sp be a union of n-dimensional intervals, each of the 
form {t: S a’, ft S a’, ---,t" S$ a"}, where a = (a’', a’, ---, a") € &,. Let 
the boundary of Sp contain no segment parallel to a coordinate axis. Let to ¢ So, 
and Jet So be so chosen that & ¢ So, for every point & ¢ E such that h > > tb. 
(So can be chosen, for example, so that its boundary is a hyperplane separating 
to from points t, > > to.) Now add to E points t* > > t , on the boundary of So, 
obtaining a set E* containing E£, and take one sample value from each of the 
points added. By adding enough such points to E, the probability can be made 
arbitrarily near 1 that at least 9H of the corresponding sample values will be 0. 
Let S, be a lower layer (with reference to the extended set Z*) containing those 
points, and only those among the points added, where the sample value is 0. 
Then if & ¢ {R,S,} we have M{R, S,;} s H/10H = 1/10, hence 


O(/) = max min M{R, 8S} s 1/10. 
R= S3toe{R,S} 

It is clear that adding points to EZ near f& need not bring O(é) near (tf) = 3 
with high probability, but can, indeed, if the points are added in a suitable way, 
bring O(ts) arbitrarily near 0 with arbitrarily high probability. 

It appears not unreasonable to expect that the consistency property will hold 
if the points of EF are required, for example, to constitute a rectangular array. 
To decide whether or not this is the case would appear to require a study of 
properties of maximum means over layers in rectangular arrays, with a view to 
obtaining an analogue of the theorem of Kolomogorov on which the proof of 
Theorem 2 is based (cf. [1]). 

The above example suggests the possible desirability of grouping together 
observations made at closely adjacent points, to avoid the paradox of the example 
in which more observations may be made to yield less precise results. 

It should perhaps be mentioned that if the points of F are held fixed while the 
size of the sample from the population at each is increased indefinitely, then it 
follows from the strong law of large numbers that, with probability 1, @(¢) will 
approach 6(t) at each point of E. 
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ERROR ESTIMATES FOR CERTAIN PROBABILITY LIMIT THEOREMS! 
By J. M. SuHaprro 


Ohio State University 
1. Summary and Introduction. Consider a sequence of independent random 
variables 2, t2, ---, %, °°: With mean O and variance o,. Let S, = 
(a, + --- +2,)/s, Wheres, = oi «+: + a. . The classical forms of the central 
limit theorem state that, with certain assumptions, the distribution function 
F(x) approaches the Gaussian distribution 


1 ” . 
&(x) = val eo? du. 


Berry [1] and Esseen [3] have studied the behavior of 


M,= sup (|F,(x) — ®(2)| 
—n<zr< wo 
and in their main theorems have obtained bounds on M, which involve the 
moments of 2, through the third. 

More generally consider a system of random variables (z,,), k = 1,2, +--+ , ka; 
n = 1, 2, --- such that for each n, the variables z,; , --- , X.., are independent. 
Let S, = 2ni + +++ + nx, and again let F,(2) be the distribution function of 
S, . From a well known theorem of Khintchine [5] it follows that if the random 
variables z,, are infinitesimal (i.e., lim,..max;<i<r, P{\tu| > e} = 0 for 
every « > 0) then the class of possible limiting distributions of F,,(x) coincides 
with the class of infinitely divisible distributions. 

Let F(x) be any infinitely divisible distribution function and let M, = 
SUuP_,.<z<.. |Fn(z) — F(x)|. In this paper we obtain bounds on M,, in the case 
where F(x) and the xz, have finite second moments. It is shown that under 
necessary and sufficient conditions for F(x) to approach F(x), the bounds on 
M,, obtained approach zero as n becomes infinite. 

Throughout the paper, given the system (z,,) we shal! let F'n.(x), gne(t), une , 
and o3 be the distribution function, characteristic function, mean, and variance 
respectively of x, , and F’,(x), ga(t), un , and o;, have the same meaning for the 
random variable S,, . 


2. Some Preliminary Lemmas. The following lemmas will be used to obtain 
the general result in the next section. 

Lemma 1. Let 2; and 22 be any two complex numbers such that |z;| S 1 and |zo| S 1; 
Then |z1 — z| S |logz: — log z). 

Received March 11, 1955. 

1 This paper is the revised form of a Doctor’s dissertation accepted by the University of 
Minnesota, 1954, and presented to the American Mathematical Society December 29, 1954. 
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This follows from the mean value theorem for complex functions (see [2] 
page 115). 
Lemma 2. Let 


fle, 2) = (@ — 1 — iy" 


for real x and t. Then for all x, |f(t, x)| S 30° and |af(t, x)/ax| < & |t\*. 
This follows from the fact that for real u, 


k n 
= > en +0 a where |@| < 1 
k=0 

Corouuary. |f(t, x) — f(t, y)| S $\t}*le — yl. 

Proor. We have f(t, x) = cos(tz — 1)/2” + 7 sin(ta — tr)/x* = R(t, x) + 
i[(t, x). By the law of the mean we have |R(t, xz) — R(t, y)| = 3 le|?- ja — y| 
and the same inequality holds for |J(t, x) — I(t, y)|. Thus |f(t, x) — f(t, y)| Ss 
V2-§-|t)":|e — y| < Fe-le — yl. 

Now let F(x) be any infinitely divisible distribution function with mean u, 
variance o°, and characteristic function g(t). According to Kolmogorov’s for- 
mula [6] for the characterization of infinitely divisible distributions with finite 
variance, we know that 


(2.1) log e() =iut + [ (ec — 1 — ite) 4, dG) 


where G(x) isa bounded nondecreasing function. If we impose that G(— ©) = 0 
and that G(x) is right continuous then the representation of log g(t) by this 
formula is unique. (Also if G(— «) = 0 then G(+) = o’.) 

Let A > 0 be such that —A and A are continuity points of G(x) and let 
0 < 6 S 2A. Define 


(2.2) m = m(A, 6) = [7A | +1 


where [r] is the greatest integer function. Let 
(2.3) —-A=%<%<4%2< +: Cam HA 
be such that z,;(7 = 0, 1, 2, --- , m) is a continuity point of G(x) and 


max las— tal < 6. 


iawl,2+++,m 


Let 
(2.4) dX os u dF ,,(u + wn) = G(.(x) 


and 


2 m 
E(n, t, m(A, 8)) = $8|t)*(o, + 0°) +5 > |G.) — Ge)| 


+ ate) — GCA) + G+ =) — (A) + G(—A) + GA) 
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with this notation and with f(t, x) defined as in Lemma 2 we have the following 
lemma. 

Lemma 3. |f*, f(t, x) d[G,(x) — G(x)]| < E(n, t, m(A, &)) for any A > 0, 
0 < 6 S 2A and any choice of x, X1, +++ , tm satisfying (2.3). 

Proor. First let 


(x, , Y odd 


lass , wteven 


and consider (for n = 0, 1, --- with G(x) = G)(z)), 


We | 
[ s6.2) dG.@) — DH, WIG) — Gea) 
D | 


t=] 
= tT (f(t, x) — f(t, &)) dG, (x) | S 3\t\*-8-(G.(am) — Gr(xo)] 
wl “2; 4 
by the corollary to Lemma 2. Now G,(«) is nonnegative and nondecreasing and 


G,(+ 2) = o, so that [G,(z,) — G,(xo)] S o4 where we define 0; = o°. There- 
fore 


if f(t, x) dG, (x) — ¥ f(t, Ed1G,(@) — Ga(ais)]\ S $\t\*5o?. 
—A i=] | 
Now consider for n ~ 0 
| A A 1 
Lf 1,2) date) — fst, 2) aG@) 
A m 
” / S(t, x) dG, (x) — > f(t, DIG, (es) — GE, (ai-1)) 
oat. 1=1 
(2.6) + > fit, E IG, (a,) pl G,,.(xi-1)] — > ft, E)(G(a,) ss G(x; _1)} 
t=] t=] 


+ > sl, ENIG(a,) — Glay_a)) — [ f(t, x) dG(x) | 


S Aldo +.) + DML ENGaCe) — Gles) — GC) +60) 


x ‘ 2 
Now from Lemma 2 |f(t, z)| S 3¢ so that, 


DIG, WG) — GG) + Gd) — G0) 


9 


a7) <<! ate ~ Gg(e)| + |Glem) — Galtm)| 


[(m—2) /2] 


+ 2 a |G, (oi) = Gitee)\ | 


t=] 
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Now if we let 
|Z; , 2 even 


- 
c 
st 


tian, todd 


we see that (2.6) still holds and that (2.7) becomes 


> ft, EG.) — Ge) + Glan) — G,(xi-s))| 
i=] | 
2 


$: ! (Gao) — Gy(x0)| + |@(am) — Ga (2m)| 


[(m—1) /2] 


+ 2 > G,,(%2i-1) eo G (29- 1) | |. 
i=l 
Combining (2.6), (2.7), and (2.8) we find 
/ "f(t, x) dG, (2) — / ft, x) dG(x) 
(2.9) * Be _ AER 
S $\t/'6(o, + 0°) + 3 - > |G, (x) — G(x}. 
~ i=0 
Consider now f= + f2 f(t, x) d[G,(x) — G(x)|. We note that 


2\¢| 


. t 
f(t, x)| = = f (e"* — 1) dt| s 
wv 6 | 


so that 


—A «2 A a 
[ +/ Slt, x) aG,,(2) s 2\t\ [ + | . dG, (x) 


»)) 
s a (G,(+ 2) — G,(A) + G,(—A)], n =0,1,2,---. 


Thus 


- + | f(t, x) alG,(x) — G(z)] 
(210) : 


< at [G,(+ 2) — G,(A) + G(+) — G(A) + G,(—A) + G(—A)]. 


By (2.9) and (2.10) we see that 


| [ S(t, x) AGa(x) — G(@)]| S ¥A"8(on + 0°) + 5 De |Galas) — G(x.)| 


+ s IG,(+ 2) — G,(A) + G(+0) — GA) + G,(—A) + G(—A)]} 
= E(n, t, m(A, 4)). 


Q.E.D. 
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3. The General Result. The bounds we shall obtain on M,, will be derived 
from bounds on the characteristic functions of F(x) and F(x) by using the fol- 
lowing two theorems of Esseen [3]. 


THEroreM 1. Let M(x) be a nondecreasing function, N(x) a real function of 
bounded variation on the whole real axis such that N’(x) exists and |N’(x)| S 
B < «, M(—«) = N(—@) = Oand N(+@) = M(+~). Let m(t) and nit) 
be the corresponding Fourier-Stielijes transforms and for any T > 0 let «€ = 
f2r|\(n(t) — m(t))/t| dt. Then to every number k > 1 there corresponds a finite 
positive number c(k), only depending on k such that |N(x) — M(x)| S k-¢/2e + 
c(k)-B/T. 

THEOREM 2. Let M(x) be a nondecreasing step function and N(x) a real function 
of bounded variation on the whole real axis such that 

1) M(—«~) = N(—«) = 0, M(+«) = N(4+) 

2) If N(x) is discontinuous at x = 2,(% < tii,” = O, &1, +2, ---) there 
exists a constant L > 0 such that min (2,4; — 2) 2 L, 

3) |N’(x)| S B < & everywhere except when x = x,(v = 0, +1, +2, --- ) 

4) M(x) may be discontinuous only at x = 2, (v = 0, +1, 42, ---). Let 
m(t) and n(t) be the corresponding Fourier Stieltjes transforms and for any T > 0 
let « = f77|(m(t) — n(t))/t| dt. Then to every number k > 1 there correspond two 
finite positive constants o(k) and e2(k) only depending on k, such that |N(x) — 
M(zx)| S k(e/2r) + ex(k)-B/T, provided that T-L = c2(k). 

Now using the notation of (2.1)—-(2.4) we define 


g(n, m(A, 4)) = [}o2 max onl? + [88(o%, + 0°)" 


- 1/3 
(31) + E > |G@,(2i) - ceo | 


1=0 


+| 4 [G,(+%) — GA) + G(+ 2) — GA) + G,(—A) + G(—A)} 


1/2 
+ 2\u, — ul] , 


This leads to the general theorem. 

THEOREM 3. Let F(x) be any infinitely divisible distribution function with mean yu 
and variance o° and with corresponding G(x) given by Kolmogorov’s formula (2.1). 
Let (xn) be a system of random variables, independent within each row with mean 
ink and variance a}, . Let F(x) be thedistribution function of S, = tai t+ +++ + nk, 
and suppose that dF(x)/dx exists and \dF(x)/dx| =< B for all x. Assume that 
on = 1, k = 1, 2, ---, k,. (The assumption c,, <= 1 is really quite weak as 


will be seen by Lemma 4.) Then it follows that for any a > 1 
(3.2) M, = sup |F,(x) — F(x)| <= k(a, B)g(n, m(A, 8)) 


—wc rc 


where k(a, B) is a constant depending only on B and on a > 1. 
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Proor. For fixed n we first obtain an estimate on |log ¢,(¢) — log ¢(¢)|, where 
¢n(t) and g(t) are the characteristic functions of F,,(2) and F(x) respectively and 
then use Lemma 1 and Theorem 1 to obtain the bound on M,,. (Since ¢(t) is 
the characteristic function of an infinitely divisible distribution, we know that 
log y(t) is defined. In the course of the proof it will also become clear that log ¢,,(¢) 
is defined.) As in Lemma 2 let f(t, z) = (e"* — 1 — itx)/z’ and define 

ken 


v(t) = it z Unk + [ f(t, x) dG, (a). 
k=1 a) 


Now jlog ¢,(¢) — log ¢(t)| S jlog ¢n(t) — Walt)| + yal) — log ¢(d)j. Let 
Fii(z) = Fux(x + une) and let ¢,.’(t) be the corresponding characteristic func- 
tion. Let a(t) = gur(t) — 1. Now gas(t) = 1 + 300%,t° where |@| S 1 and 
therefore 


(3.3) \ana(t)| = |0| donut” 


Let T, = 1/g(n, m(A, 6)) and assume in the rest of the proof that |{} < T,,. 
Then we see that ja,,(t)| S # and that 


anal) + dni(l) ak 


log gas(t) = ay(t) — ; 


so that 


= |ane(t)|” nk(t)|” ‘ : 
(3.4) log gni(t) — a,(t)| S a |one() S }. |anu(t) ~ £3 la,.(d)|. 
r=? r l 7 One(b) 
Now we note f*. f(t, 2) dG,(x) = >oi%: a(t), so that 


Kn 


(3.5) Valt) = D> (itune + ane(t)). 
k=1 


Also ens(t) = ¢ *, .(t) and thus log ¢,(t) = an (tun. + log gnx(t)). 
This together with (3.4) and (3.5) shows that 
kn 


kn 
log on(t) — walt) S D> log bre(t) — an(t)\< $Y land) 
k=l 


2 


But from (3.3) we see |a,,(t)| S loi,t so that 
(3.6) log ¢,.(t) — ¥,(t)| S #- = sk = & Co, max ony 
4 k IskSkp 


where a;, is the variance of S, . Now log g(t) = iut + fee f(t, x) dG(z). 
Thus 


Kn na 
iv,(t) — log ¢(t)| S |t| - > Punk — Bi + / f(t, x) dG,(a) — G(x))). 
k=1 — 30 


Applying Lemma 3 we see 


kn 
\W,(t) — log ¢(t)| S |t| - | a Mak — u| + E(n, t, m(A, 6)) 
k=l 
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and using (3.6) we have 


log ¢n(t) — log g(t)| < {$tfo, max on, + |t\-\u, — u| + E(n, t, m(A, 8))} 


lsksky 
= A(t, n, m(A, 4)). 


Now using Lemma 1 we have |g,(¢) — ¢(¢)| S A(t, n, m(A, 4)) for |t] S T,. 


In order to apply Esseen’s Theorem 1 we consider 


Tr | yi | e Tn ° 
[ oe oll) \dt 32 | Wt, n, mi a oh ) dt S g(n, m(A, 4)). 
ss | “0 jl) 


Ty 


Now applying Theorem | we see that for any a > 1, 


sup |F,(x) — F(a)| Ss x g(n, m(A, 5)) + c(a)B- i = k(a, B)g(n, m(A,6)) 


wcrc n 


where k(a, B) = a/2m + c(a)B. Q.E.D. 

We shall now examine, under suitable conditions the behavior of g(n, m(A, 6)) 
as n becomes infinite. To this end we state Theorem 4 (c.f. [4]) which gives the 
condition for the distribution functions F(x) to converge to a limiting distri- 
bution and also gives the form of the limiting distribution. 

TueoreM 4. Suppose that the random variables (an. — nx) are infinitesimal. 
Then a necessary and sufficient condition for the convergence of the distribution 
functions of sums S, = Ini + +++ + Inez, Of Independent random variables with 
finite variance to a limiting distribution function with finite variance, and the con- 
vergence of the variances of these sums to the variance of the limiting distribution 
is that there exist a bounded (non-decreasing) function G(u) (with G(—«) = 0), 
and a real constant uw such that 


kn u 
1) lim >> [ a dF x(x + une) = G(u) at all continuity points of G(u), 


n—+00 k=] 


ke « © 
2) lim >> / x dF x(x + unr) = [ dG(u) = G(+@) and 


n—-20 k=1 


kn 
3) lim > wae = pw. 


n—-2© k=l 


The characteristic function of the limiting distribution is given by Kolmogorov’s 
formula (2.1) using the constant » and the function G(u) just determined. 
Motivated by this theorem we shall assume 


(a) (ane — par) infinitesimal, 


} b) lim F,(a) = F(x) (at all continuity points of F(x)) and 


n--2 


| ‘ ° 2 2 
ic) lime, =o 


n--% 
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and we see by Theorem 4 that, in the notation of (2.4), 
(1) lim G,(x) = G(z) at all continuity points of G(z), 


| n—20 
2) lim G,(+2) = G(+) and 
kn 


3) lim 7 Mnk = Bb 


nec k=1 


where G(x) and yu are the corresponding G and » of Kolmogorov’s formula (2.1) 
associated with the infinitely divisible distribution function F(z). 
We have the following lemma. 
Lema 4. If the system (x,x) satisfies (3.7) then lim,... MAaXi<i<k, Tne = O. 
Proor. We have 


xe 
2 2 
max on, = max [ x AF x(a + pnt). 
1<kSkpy Isk<k, J—@ 


Let « > 0 be given and let c > 0 and —c be continuity points of G(x) such that 
(3.9) IG(c) — G(+@)| < Sand |G(-c)| < ¥ 

i 
Now 


wo 
max [ ve dFyda + ua) S max / i x dF x(x + par) 
— 30 itis vV €/7 


lskskn isksky 


+ max / x dF y(a + unr) + max / xv aF (x + par) 
€/7<\ z\<e z|>c 


lskskn 1<k<k, 


s5te max P{|xae — Hat] > Ve/7} + Ga(—C) + G.(+ 0) — GAO). 
By (3.7) and (3.8) we may take N so large that n > N implies 
maxi<ic<in P{|tnr — wnt| > Ve/7} < ¢/7c’, 
IG,(—e) — G(—e)| < ¢/7,|G(+e) — G(+2)|< &/7 
and G(c) — G,(e)| < €/7. 

Thus we see using (3.9) that 
wae on, = - + |G,(—c) — G(—e)| + |G(—e)| + |G,(+ 0) — G(+)| 

+ |G(+ 2) — Go| + |G(e) — G,(E| Se for n>QN. 
Q.E.D. 


With the notation of Theorem 3 we have the following lemma. 


Lemma 5. If the random variables (x,,) satisfy (3.7) then for fixed A and 6, 
limnee Doo” |G,(a;) — G(ax,)| = 0, and lim,.,, |u, — »| = 0. This follows from 


1 and 3 of (3.8). 
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We see from Lemmas 4 and 5 that the first, third and last part of the fourth 
term of g(n, m(A, 6)), (see (3.1)), approach zero as n becomes infinite under 
(3.7). Intuitively we think of 6 being small and A being large so that g(n, m(A, 6)) 
will be small. We shall formulate this idea precisely. Let 1 > 6 > 0 be such that 
+ (1/5)'”” are continuity points of G(x), (G(x) is arbitrary but fixed), and con- 
sider g(n, m(1/é"”, 8)), (ie. let A = 1/8"*) and let m(1/8", 8) = m(8). If 1 > 6, 
> 0 is any sequence of constants then under Theorem 3, we know that 
(3.10) sup |F,(z) — F(x)| s k(a, B)g(n, m(6,)). 

—wc rc 
The foregoing discussion leads to the following result which we state as a theorem. 

TueoreM 5. If F(x) and the random variables (x,,) satisfy (3.7) and the hypoth- 
esis of Theorem 3, then there exists a sequence {1 > 6, > 0}, 5, — 0 such that 
(3.10) holds and such that lim, .,. g(n, m(6,)) = 0. 

Proor. We know that (3.10) holds for any sequence 1 > 6, > 0 of constants. 
By Lemma 5 we see that if +(1/5)"”” are continuity points of G(x) then 

m(6) 


a IG,(x:) — G(x,)| > 0 
as n— «. Clearly we can find a sequence 6, — 0, +(1/6,)'” continuity points 
of G(x) such that yn” IG,.(x:) — G(x,)| > 0. But then using Lemmas 4 and 5 
and this sequence {6,} we see that g(n, m (4,)) — 0 as n becomes infinite. 
A result analogous to that given by Theorem 4 of Berry [1] is contained in the 
following corollary to Theorems 3 and 5. 
Coro.uary. Under the hypothesis of Theorem 5 if n is so large that 


| Kn i 
ID [i oF aPalu + wu) — Ole) | = 3, 
k=l “—o@ 


2 5/4 
max on- S56 


lsk<kn 


and |u, — u| < 3” 


then there exists a finite positive number K, independent of n and 6 such that M, S 
Ks". 

Now of course Theorems 3 and 5 require dF (x)/dzx to exist so that in particular 
F(x) is continuous. By use of Theorem 2 we can obtain a theorem weakening 
the condition of continuity but which will require F,,(x) to be a step function. 
(As a special case where we require F(x) itself to be a step function we get a 
stronger result due to the fact that dF (x)/dz = 0 whereever it exists). Using the 
notation of Theorem 3 we have the following theorem. 

TuHEeoreM 6. Let F(x) be an infinitely divisible distribution function with two 
moments such that if F(x) has discontinuities at x, , 


(ty < %41,v = 0, +1, +2,-:- ), 


then there exists a constant L such that min (2,41 — x») 2 L. Suppose that dF (x)/dx 
exists everywhere except atx = 2 ,v = 0, +1, +2,--- and \dF(x)/dz| S B, 


x # x,. Then if F,(x) ts a step function whose only possible discontinuities are 
x=2,,v = 0, +1, +2,--- andifon S 1,k = 1, 2,---, k,, it follows that 
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for any a > 1 sup_ewczce|F,(x) — F(x)|S ka, B)g(n, m(A, 5)) provided that 
L/g(n, m(A, 6)) 2 @(a) where c(a) is the constant determined in Theorem 2 
and k(a, B) has the same meaning as in Theorem 3. 

The proof of this theorem is the same as that of Theorem 3 except that we use 
Theorem 2 instead of Theorem 1. 

Now if we define gi(n, m(A, 8)) = o% MAXi <k <kn one + Son +) + 
Dito |Ga(xs) — G(xs)| + (Ga(+ 2%) — Ga(A) + G(+ ©) — G(A) + G,(—A) + 
G(—A)}/A + |u, — u| we have the following theorem. 

THEOREM 7. Let F(x) be an infinitely divisible distribution function with two 
moments and further let it be a step function with discontinuities at x = 2_(X» < X41, 
v = 0,+ 1, +2, ---). We assume there exists a constant L such that min (2,4; — 2») 
= L. Then if F(x) is a step function whose only possible discontinuities are x = 2» , 
v = 0, +1, +2, --- , it follows that for any a > 1 there exists a constant k(a) 
depending only on a such that sup_x<z<w |F,(x) — F(x)! S k(a)gi(n, m(A, 8)) 
provided that maxi<i<i, On S L/e2(a). 

Proor. Clearly the essential difference in this theorem and Theorem 6 is in the 
absence of the roots in the expression g;(n, m(A, 6)). The reason for this is that B 
of Theorem 2 may be replaced by zero here. With this in mind if we define 
T, = T = e,(a)/L a proof analogous to that of Theorem 3 will hold here as well. 

We remark that both Theorems 6 and 7 can be extended just as Theorem 3 was 
and that the remarks and lemmas following Theorem 3 hold here as well. 


4. Specialized theorems for the cases where the limiting distribution is 
Gaussian or Poisson. In the special cases where the limiting distribution is 
Gaussian or Poisson the results of the last section may be simplified. For the 
Gaussian distribution, 


1 . : 
${x) = wal e? du, 


the G of Kolmogorov’s formula (2.1) is given by 


(0, <0 
(4.1) G(x) = 
\l, 20 
and for the Poisson distribution, F(z) = )co<k<z¢ A‘/k! we have 


(0, ee 2. 


(4.2) G(x) = 
lA, 221. 


The simple nature of the G’s in both of these cases is the reason the results may be 
simplified. 
4a. The Gaussian distribution. In contrast to (3.1) we define (for any « > 0) 


go(n, -) = [go max onl” + [4° max (0%, 1)-<)" 


isksk, 
ol aa kn 1/3 
+ [> / x dF (x + Unk) + ho’, — | + | 
k=} “|z| 2>6€ 


We have the following theorem. 
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TuHeroreM 8. Let {e, > 0} be any sequence of constants. If on. S 1, k = 1, , 
k,, , then for any a > 1 sup_scsce |Fn(z) — O(x)| S k(a)ge(n, en) where k(a) is a 
constant depending only ona > 1. 

Proor. The proof here follows the same lines as the proof of Theorem 3. 
Suffice it to say that in estimating |f" f(t, x) d[G,(x) — G(x)]| , where G(z) is 
given by (4.1), we consider 


—tn 


[ s0.2) date) — a) = [se 2) a@,(e) — 6@)) 


+ [. "f(t, 2) dlGq(2) — G(2)] + [. ” ilt, 2) dI@,(z) — G(x)) 


instead of using Lemma 3. Using integration by parts on f‘"., and noticing that 
on (— ©, —e,] and [e, , +), or — G(zx), is increasing we obtain 


[sux de.@) -e@) sD fa arule + us) 
© dX [lz] Sen 
+ : lo? — 1] + $lt!” max (02, I]-«,. 


Now if we assume (3.7) where F(z) is the Gaussian distribution (x), it follows 
that for any « > 0. 


n-2© k=] 


1) lim > | a dF y(t + par) = 0 


2) lim =f xv dF a(t + un) = 1 and 
jz|<e 


| n—2 k=l 


kn 

3) lim a. Une = 0. 
n—-2 kel 

Thus, using an argument similar to that used in Theorem 5 we see that if we 

assume (3.7) that there exists a sequence {e, > 0}, e, — 0 such that lim,.. 

g2(n, €n) = 0. 

In order to see more precisely how g2(n, €,) behaves, we shall consider, under 
appropriate assumptions, finding explicitly a sequence {e,} which will make 
g2(n, €,) approach zero as n becomes infinite. We assume that the random vari- 
ables of the system (z,,) satisfy (3.7) (and therefore (4.3)) so that in particular 
by Lemma 4 we have lim, .. M&aXi <i <z, a, = 0. Also assume that there exists 
a p > 1 such that 


1/p 
(4.4) max - (Uf (x”)? AF x(a + =) 
lskskn yk 


is bounded in n, and let g be determined by 1/p + 1/g = 1. Then it follows that 
if we take 


(4.5) 6. = [ max ar 
Isksky 
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that go(n, en) ~0asn— ~. In fact the only term in g2(n, ¢,) that needs study is 
Dots Sizize, 2 dF (2 + pax). But by Holders inequality this is 


kn 1/p 1/q 
E (fv ahate + wad) (Cf daale + ua), 
k=1 [zl >€n |z]| Sen 


and using Tchebycheff’s inequality we see this last expression is 


2 1/q 
. MAX GOnk 
S B-o,\ 'skstn 
En 


where B is the bound on the expression in (4.4). Using (4.5) we obtain 


kn 
= / av dF a(x + par) S B-on{ max oy) °*. 
k=l 4jz|>en IskSkp 
Thus using Theorem 8 we have proven the following theorem. 

THEorEM 9. Jf o,, S 1, k = 1, 2, +++ , kn and if the random variables satisfy 
(4.4) then 


sup |F,(x) — ®(x)| S k(a){[$o, max ons)? + |2p,|"" + [3(0, — 1)|'" 
—w<r<eo IskSkp 


+ ((4° max (0%, 1))'* + [Bor}*)[ max o,4)°**}. 
Iskskn 
We remark that if the (2,,) are not assumed to have any moments and if the 
limiting distribution is Gaussian or Poisson it is possible, using the method of 
truncation to obtain an error estimate on sup_x<z<« |F,(x) — F(x)| and to 
show that the estimate approaches zero under conditions analogous to (3.7). (To 
show that the estimate approaches zero it is necessary to know (among other 
things) that lim... oi2: fizi>r@F (x) = 0 for some r > 0. This is not neces- 
sarily true if the limiting distribution is not Gaussian or Poisson.) 
4b. The Poisson distribution. Define (for any « > 0) 


va 
gs(n, «) = {ot max on: + lun —AL + DL [ a” dF yx(x + pe) 
|z—1| De 


lskskp k=l 


+ |or — d| + max (0%, d)-e 


' 
4 


where \ is the parameter of the Poisson distribution. We have the following 
theorem analogous to Theorem 8. 

THEeoreM 10. Let F(x) be the Poisson distribution and assume that F,(x) is a 
step function whose only possible discontinuities are at x = 0, 1, 2, --- . Then it 
follows that for every a > 1 there exists a constant d(a) depending only on a such that 
SUP_w<rce |Fn(x) — F(x)| S d(a)g;(n, €,) provided that maxi<i<kn One S 1/e2(a) 
where c2(a) is the constant determined in Theorem 2. 

The proof of this theorem is essentially the same as the proof of Theorems 3 
and 8 and will be omitted. We remark however that in place of 7’, in Theorem 3 
we let 7 = c2(a) and restrict |t) < T. 











ERROR ESTIMATES 629 


Now if we assume (3.7) with F(x) the Poisson distribution, it follows that 


kn 


(1) lim >> [ x dF u(x + pn) 
|z—l| De 


ne k=l 


0 


| 
kn 
(46) }2)tim 2 [at dual + uns) = d 
z—l|<e 


n—~20 kel 





kn 

3) lim du Mnk = 

and the same type of remarks following Theorem 8 hold here as well. In particu- 
lar, under (3.7), we see that there exists a sequence {e, > 0} such that 
SUP_x<z<o |F,(x) — F(x)| S d(a)gs(n, €,) andg;(n, €,) approaches zeroasn — ©, 
We could also consider finding an explicit sequence {e,} such that g;(n, €,) 
approaches zero as n becomes infinite. In fact if {e, > 0, ¢, — 0} is such that 
for some 7, 0 < » < 1, and for some p > 1. 


1/p 
| , 
as «i P I os 
Isksk, o k |z—1| De, ” d ne (a nt) 


n® \ |zl29 


is bounded in n then (under (3.7) and hence (4.6)) g3(n, €,) approaches 0. This 
follows the same way as the proof of Theorem 9. 

As we have said, the simple nature of the G(x)’s defined by (4.1) and (4.2) is 
the reason the error terms for the Gaussian and Poisson distribution are simpli- 
fied compared to the general case. Evidently the same type of arguments used in 
this section could be used for other limiting distributions, provided that the 
corresponding G(x) is of this simple form. Let 


fa, xb 
(4.7) G(r) = 4 
I 10, «<b a ~ 0. 


(If a = 0 this G(x) corresponds to the unitary distribution.) This leads to the 
following theorem. 

THEOREM 11. Let X be a random variable with infinitely divisible distribution 
function F(x). Suppose that F(x) has mean p and finite variance a’-* Let the G(z) 
of Kolmogorov’s formula (2.1) be given by (4.7). Then if b = 0, F(x) is the Gaussian 
distribution, and if b ¥ 0, the random variable x’ = (x — uw + a’/b’)/b is Poisson 
distributed with parameter a’/b’. : 

This theorem follows readily from an examination of the characteristic func- 
tions of X and X’. 


5. An example. We now consider a specific example, that is a specific system 
of random variables (x,,). The system we define here is the system considered in 





2 If X does not have any moments, then a similar theorem holds using the Lévy-Khint- 
chine formula for the representation of infinitely divisible distributions [7]. 
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the classical Poisson theorem, that is z,, is determined by 


Pian 1} = . 
n 


"tIn~ = 0} l ae where A = 0, ( < 1), 
n n 


k = 1, 2,--- ,n. We define S, in the usual way so that S, = 21 + +--+ + Zan. 
It is well known that the distribution functions F(z) of S, approach the Poisson 
distribution with parameter \. Using Theorem 10 we consider sup_s<zcw |F'n(x) 
— F(z)| . We note that u,, = A/n and or = A(1 — A/n)/n. Assume that n > 2A 
and define «, = Ar/n where 2 > r > 1. Now consider the terms involved in 
g3(n, €n): ; 
o, Max on = ‘(1 — ; lun — A| = O, 
lgksk,, 7 n 


» 


, 2 ' a 2 Nr 
lon — Al = nh? max (o,,A)-e, = ~ 


kn 2 2 
> | a dF x(t + pn) = n-(*) . (1 - ) = (1 - =). 
k=l J|2—-1| De, n n n n 


Thus (for n 2 c2(a)-X) we see (since r may be taken arbitrarily close to 1) 


4r° 3d’ + | 1 


and 


<>-= 


9 


n n° n? 


sup |F,(x) — F(z)| s ao) | 


—w< rc n 


where D is a constant. 

We remark that although in the above example the system (z,,) is explicitly 
given, by use of the theorems given here bounds can be obtained on M,, without 
actually knowing specifically the system x,, involved. Finally we remark that 
analogous results to those presented here could be obtained by considering sums 
S* = am +--+: + au, — An in place of S, = 2a + --- + nz, where the A,, 
are constants. 
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DECISION RULES, BASED ON THE DISTANCE, FOR PROBLEMS OF 
FIT, TWO SAMPLES, AND ESTIMATION! 
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1. Summary. Concrete decision rules are given for the problem of goodness of 
fit and the problem of two samples with a risk smaller than’ any preassigned 
value. The problem of estimation is also treated. 


2. Introduction. In the theory of statistical decision functions it is-very de- 
sirable to give the problem considered a concrete solution which evaluates the 
risk. We have previously given a concrete form of the decision function when the 
distribution functions involved are specified, so that the risk can be made smaller 
than an arbitrarily preassigned positive number by suitable choice of sample-size 
({1}, [2]). In that case, the following notions of affinity and distance played an 
important role. Let F, , fF: be simultaneously discrete or continuous distribution 
functions, so that by the aid of a suitable measure m, the probabilities of a set 
E under F; and F; can be written as: 


F\(E) = i pi(x) dm, FE) = i p(x) dm 
gE E 


respectively. Here E denotes an arbitrary measurable set for which the prob- 
ability under F, or F2 is defined. The distance between F’, and F2 is then given by 


os (J, (V(x) — Vila)" dm) 


and the affinity between F; and F, by 
o= | Voice) Vale) am 


where R denotes the whole sample space (of one dimension). 

In the present paper, using the distance |! |, we shall give a concrete solu- 
tion to the problem of goodness of fit and the two-sample problem and mention 
finally that the estimation problem can be treated similarly. Our treatment of 
the problem of fit is based on the following considerations.’ It is not necessary 
to decide whether the random variable on which the observation is made has 
exactly the given distribution or not but to decide whether the variable has a 
distribution near the given one. On the other hand, from a finite number of ob- 


Received July 20, 1953, revised April 15, 1955. 

1 The results of this paper were announced in March, 1953 at a meeting of the Institute 
of Statistical Mathematics. 

2 See [2]. These considerations and the formulation of the problem were also given 
in [3)}. 
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servations it is impossible, or at least very difficult, to discern efficiently whether 
the variable has exactly the given distribution when in the alternative class of 
distributions there exists one which lies within any distance from the given dis- 
tribution. From this point of view, we formulate the problem as follows.” 

When a distribution Fy is given, decide whether the random variable has the 
distribution Fy or a distribution outside some neighborhood of Fo . 

The neighborhood of Fy) mentioned here is defined as the set of distributions 
{F:||F — Fo|| < e}, where a positive number e« is determined according to the 
nature of the problem concerned. Throughout this paper we shall consider only 
discrete distributions with a finite number of possible outcomes. For practical 
purposes this does not involve an essential loss of generality since in most statis- 
tical applications the quantities observed can be grouped in a finite number of 
classes. 

The actual method of our treatment of the problem is based on the following 
properties of distance which the distance || '| possesses: 
(1) Axioms of distance: 


(i) 6(F, G) 2 0, 
i(F, F) = 0, 
(ii) 6(F, G) = 4(G, F), 
(iii) 6(F, G) + 8(G, H) = 4(F, H), 


for any distributions F, G and H, where 6(F, @) denotes the distance between 
F and G. 


(2) For any integer n and any positive number 7 there can be found a sequence 
of numbers {B,} such that 


Pr{d(F, S,) > n} = B, 


for any F in the class of distribution functions under consideration, where S,, 
denotes the empirical distribution function based on n observations of a random 
variable with distribution F, and such that B, — 0 as n — «. There are, of 
course, other distances having properties (1) and (2). For instance, 


6(F, F’) = (x (pi — pi) 


where F and F’ are discrete distributions defined on the same events with prob- 
abilities p:,--- , pe, and pi, ---, pe, respectively. Actually, our method can 
be applied with any definition of distance which has properties (1) and (2). 
Among the above properties, (1)—(ii) need not necessarily hold for the problem 
of fit, but must hold for the two-sample problem. (1)—(iii), that is, the triangle 
inequality, must always hold. A so-called directed distance, like 


(. (Fila) — F,(x))° ary)) , 


does not always satisfy these conditions, so that we cannot use it, at least for the 
two-sample problem. Further, x’, itself, does not have property (2). 
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Since the inference will be based on a finite number of observations, it is de- 
sirable that the distance we use has the property: 

(3) The distance represents well the discrepancy between distributions at 
every point, or, for the problem of fit, it represents the discrepancy at the point 
with large probability (or density) better than that at the point with small 
probability (density). 

Taking into account (3), the Fréchet distance 6,(F, G) = sup. |F(x) — G(zx)| 
is seen to be unsuitable, as is any distance similar to it, although it is useful in 
cases where the convergence or a property in the limit is concerned. The same 
can be said concerning x’. When we consider x’ as a quantity which expresses 
the discrepancy between the theoretical and the empirical distribution, the 
discrepancy at the point with small probability is liable to make an excessive 
contribution. This shows that, in general, x’ is not always suitable for the problem 
of fit. 

On the other hand our distance || || seems to satisfy property (3) ade- 
quately and is also simple to compute. 

With this distance we shall give a simple upper bound of the risk and at the 
same time show how to make the risk smaller than any preassigned positive num- 
ber. This is not the case with the tests thus far presented, like the x’-test. 


3. Properties of distance || || and affinity p. In this paper, we shall use only 
the distance || || explicitly, but for its calculation the affinity p will prove use- 
ful. Therefore, in the following we shall state the properties of p as well as those 
of the distance || ll. 

First we have 


0 1, 
|\F, — F,||? = p(F,, F2)) S 2, 


Fi — Fill’ < [ tp(e) — pxG@)| dm s 2I|Fs — Fl 


From these relations it follows that 
\F\(E) oe F,(E)| Ss 2\|Fi — F; 


for any set Z, and that for a sequence of distributions {/,,} and a distribution 
Fo it holds 


F,, — Foi - 0 (n— 20 ) 
when and only when 
(3.1) |F, (EL) — Fo(E)| - 0 (n— «) uniformly in £. 


Further, ||F,; — F:|| = 0 when and only when F,(Z) = F,.(£) for every set EF. 
We also note that ||F, — Fo|| ~ 0 (n — ~) implies p(F, , Fo) — 1 (n— ~) 
and vice versa. 

Relation (3.1) is Wald’s definition of regular convergence in finite-dimensional 
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sample space. When the set of distributions Q, therefore, is separable in the 
sense of regular convergence, this topology is equivalent to that given by our 
distance || |. It follows from this fact that if Q is compact with respect to 
the topology induced by regular convergence, it is also compact with respect to 
the topology induced by our distance. 


4. Fundamental theorems. Let F be a discrete distribution with probabilities 
Pi, P2, °** , Pe for the events (1), (2), --- , (kK), respectively. Let n; be the num- 
ber of occurrences of event (7) in n observations. We denote the empirical dis- 
tribution (n/n, ---, n/n) by S,. Then, by definition, 


IF - SP = (4/™ - va) =2(1- E 4/™ v0). 


t=] i=l 
The last expression in terms of the affinity p = )-*_, Vn,/n Vz can be used 
for the calculation of the distance ||F — S,,|!. 
TuHeoreM I. When the random variable concerned has the discrete distribution F, 


then we have 
Pr{ iF - si’ <*> 4h 2 -} 
n t 


for any positive number t. 
Proor. Let pi, --+, per > O, and prryr = --- = MH “ <= k). Then, 
clearly 


i=l 


aut = £(y/E- va) 2A (e-o + S20 


n i=mk’+1 10 


i=1 Di 
Accordingly, 
i _ i k’ 
na) <1 B(™ — p.) = $1- Pe 
i=] Pi nm i=l 
where E denotes the expectation with respect to F. Now, an inequality of Markov 
shows 


Pri’ < E(A’)t} = for any positive ¢, 


and we have 


which we wanted to prove. 

When p; > 0, i = 1, 2, --- , k, and n is large, x’ = Diagn — np.) / np; is 
asymptotically distributed according to the chi-square: distribution with k — 1 
degrees of freedom. Therefore, from the relation 


ur sup = LS =O (sg g/B" 
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we have: 
THEOREM II. When the random variable concerned has the discrete distribution 
F with p; > 0 (¢ = 1,2, --- , k) and n ts large, we have an asymptotic equality 


Pr{||F — S,ll? < 8} = Pr{xi—» < 4nd*} 


for any number 6, where x{x—1) is a random variable having the chi-square distribu- 
tion with k — 1 degrees of freedom. 

When applying such an inequality or equality to the problem of fit, for instance, 
it must hold for any distribution under consideration (see below). Therefore, 
in case the alternative class of distributions against the specified distribution 
includes one with any small probability p,;, we cannot use an inequality or 
equality containing D*(x’*), like 


Pr{||F — S,\!? < n} 21 — (1/ n'y’) {(k — 19° + D*(x’)}, 
which is derived from the relation ||F — S,!!? < x’ /n when 
pi > Oi = 1,2,---,h), 


although it may seem more precise than the inequality in Theorem I at a glance. 
For we have then supy D*(x’) = ~ and cannot obtain an adequate inequality or 
equality for our purpose. On the other hand, the inequality in Theorem I holds 
in any case and is applicable to the problem. When there is a positive lower 
bound for all p; , we can obtain more precise inequalities. For example, let po be 
such a lower bound. Then we have 


( . 
Pr{\||F — S,||’ < n} = 1 — ae — l + (= = kK’ 7 2k + 2)} 
2 n pe 


wn ) 


or, when 
1 k 2 ) 
k+1+——— [|— -— k -— 2k +2}, 
nn >k + +a yalé + 
nn — k + 1)? 
Pri||F — S,|[? <a} 21- th 
’ - we atk ~ 1) +2 (4 ~~ 2 +2) 
nN \Po 
(See [4|.) The result of Vora ((6]) could also be used. In this paper, however, we 
shall not assume that there exists such a lower bound. In the following we shall 
denote generally by (C,,x._1) or for short (C,,) a class of distributions such that 
x’ based on n observations of the random variable has asymptotically the chi- 
square distribution with k — 1 degrees of freedom for any F in (C,%-1). A set 
of distributions |(p:, pe, --* , px)} in the same finite discrete space, for which 


there exists a positive number po such that p; > po, defines such a class (C,) 
for n sufficiently large. 


5. The problem of goodness of fit. As stated in the introduction, our formula- 
tion of the problem is to find a rule according to which, for any given finite 
discrete distribution Fy and 69 > 0, one can decide whether a random variable 
has distribution F» or a distribution F with ||F — Fo! = &. 
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Now, let X be the random variable of interest. When X has the distribution 
Fy) we have by Theorem I 


Prd |\r, — S,||? < ens i z=1- . 
\ n t 
and when X has a distribution F with ||F — Fo|| = 60, 
Fo — S,l| = [|Fo — Fl| — /F — Sull = & — ||F — Sul 
and 
pr{iie — suit <2=4 eh 21 
Consequently 


Pr{ iF — Sal] > 5 
On the other hand, when 


Therefore, we have: 
TueoreM III. For any positive number t, let n = > — 1)t / 85. Then, when 
X has the distribution F» , we have 


Jue pe = 1 
Pr {iif — Dal|> < ~_r i = 


and when X has a distribution F with \|F — Fo|| = 


Pr{iif Bde % \ Ts 4 

n t 
According to this theorem we can answer the problem formulated above. The 
risk then can be made smaller than any preassigned value ¢ by taking ¢t > 1/e. 
Here we assume that the weight function is zero when the decision is correct 
and less than or equal to 1 when the decision is wrong. 

If it is known that Fy) and any alternative distribution (pi, --- , pe) are con- 
tained in a class (C,,), Theorem II is applicable instead of Theorem I, and we have 
the following theorem. 

TuroreM IV. Let Fo and a distribution F with ||F — Fo\\ = 59 be in a class 
(C,,), and let n be any positive number smaller than 5). Then, when X has the dis- 
tribution Fo , we have 


Pr}{| Fy - S,l|° < n} = Pr{ xe» < 4nn’} 
and when X has a distribution F 


Pr{||Fo — Spall? = a} = Prixi—» < 4n(o0 — n)’} 
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where xix-1) is a random variable having the chi-square distribution with k — 1 
degrees of freedom. 


6. Two-sample problem. Let X and Y be (not necessarily independent) 
random variables which have discrete (marginal) distributions F = (p}, --- , ps) 
and G = (qi, ---, gs) on the same events (1), --- , (k), respectively, that is, 
ps = Pr{X = (2)}, g9¢ = Pr{Y = ()};7¢ = 1, --- , k. Further, let (m,, --- , me) 
(> n, = n) and (m,---, m)( >> m; = m) be observations on X and FY, 
respectively. Then we want to decide from the two sets of observations 
(ny, --+, Mx), (m, +++, m) whether F = G or ||F — G/| = 40, 5 being a pre- 
assigned positive number. In this problem we are also interested in whether 
F and G lie near each other. Denote the empirical distributions (n;/n, --- , n/n), 
(m,/m, --+ , m/m) by S,, Sm, respectively. Then we have: 

TueroremM V. Lei n be any positive number smaller than 59. When |\|F — G|| = 0, 

k-1 


, b-ifl 1y 
(a) Pr{llSe — Sail <9) 21- "3" (+ Ze) 


and when ||F — G\| = 80, 
(b) Pr{|\s Sil 2 21-G— (Rt) 
ti]On ™m = 7 = (59 — n)° Vn a/m e 


Moreover, if X and Y are independent of each other, we have 


() Pri — Sl < eh BI —- (2 + =) gto 


o 
7 n m nnm 


when ||F — G\| = 0, and 


) Pris —suizna1- M2242) 4 Mey 
( ) r{ || a = m|| = n} = “a (89 sain n)° [ + m + (8) — n)‘nm 


when |\|F — G'| = 6. In this case, (a) is more precise than (c) when and only when 
3(m + n) — 2+/mn > 16(k — 1) / 7’, and (b) is more precise than (d) when 
and only when 3(m + n) — 2+>/mn > 16(k — 1) / (9 — n)’. 

Proor. When ||F — G|! = 0, then 


IS, — Sail s ||/F — S,|| + iG — S,l 


Therefore, according to an inequality of Markov we have 


Pr{||S, — Sml| <n} 2 Pr{||F — Sal| + \|@ — Shrl| <n} 
Ae = BWP ~ S,|| + |I@ — Sol)? 


21 - S(VEUF = S19 + VE(G= S25)’ 


k-1 (= +) 
Tgtio hea? fan 
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and moreover, if X and Y are independent, 
Pr{||S, — Snl| <n} 2 Pr {||F — Sall < 4n, ||@ — Sal] < 4n} 
= Pr {\|F — S,|| < 4n} Pr {||@ — S,]| < 3n} 


a(t ee) 


12 4(k va 1) (2 + 4) + 16(k acs 1)" 


7 n m 


= 1 


7i*nm 


When ||F — G|! = 6, then 


S,|| 2 \|F — G| — ||F — S,|| — |G — S; 


=b-— |F —S, -— |G -— S, 


from which we can obtain (b) and (d). 

The remaining part of the theorem can easily be seen. 

On the basis of this theorem we can make decisions about the two-sample 
problem. 

Further, if it is known that X and Y are independent of each other and F and 
G belong to a class (C,,), then we can make use of the following theorem. 

TueoreM VI. Let X and Y be independent, and F and G belong to a class (C,.), 
and let » be any positive number smaller than 59. Then, when |\|F — G|| = 0, we 
have the following asymptotic inequalities 


\ 


no wut . l 2 1 r2 2\ 
Pr {| Sn — Sail < n} = rf Xk-1) FH XY <7 ? 


\2 2m 
bond -* (: + } 
2n7 \n m 


Pr{||S, — Sh!) <9} 2 Prixieay < nn} Pri xis < mn’} , 
and when ||F — G = 6, 
> > p f 1 2 1 ‘2 2\ 
=n} 2 Pr a— X(k-) + = XK-) S (59 — n)°) 
2n 2m 


k—1 1 1 
in seule te) 


Pr{||S, — S), n} 2 Prixt—» S n(So — n)*}Prixié—» S m(bo — n)”} 


Ay 


Pr} |S, —_ 


and 


2 2 > e e ° 
where x(e—1) and x1) are tndependent random variables each having the chi- 
square distribution with k — 1 degrees of freedom, respectively. 


7. Estimation problem. Now, let us turn to the problem of estimation. In this 
case too, as mentioned in the introduction, we confine ourselves to discrete 
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distributions. Then, if (m,--- , m), S,, and F denote n observations, the em- 
pirical distribution, and the true distribution, respectively, we have for any 
positive ¢ 


l peg + 12 k-— 1 
Pr ||F v S,|| < te, (mr ‘ 


This means that 


k — 2 
im Ss 2 - (nN, ba Ja. k = 
(7.1) IF — S,|| 2» (4/ ae ps) eo 


is a confidence band for an unknown F with confidence coefficient at least 1 — 1/1. 

If the form of F is known and only the parameters a , --- , a, involved in F 

are unknown, the confidence intervals for a; , --- , a, are also obtained from (7.1). 
When it is known that the unknown F is in a class (C,,), the relation 


Pr{\|F — S,\|? < 8} = Prixi-» < 4né’} 


can be used to obtain a confidence band. 

As to point estimation (see also [4], [5], [7], [8]), one can estimate parameters 
by minimizing ||F — S,||* under the restriction >> p; = 1. Let F.,, be the distri- 
hution with these estimates replacing the parameters. Then one can show that 
these estimates converge stochastically to the parameters a, ,---, a, in F, 
respectively,’ by means of the following relations: 


Peon Pr PF s Fen oT? Sr + PF a" Sr < y i" — S 


nmiiys 


k- 1 \ . ( . aah l ‘ky — 1 ) . 4 
> F — fi | benmmnitens > P — Da - z — ft, > — 
Pr Tt hy (/ n 4 rail — 2 V/ n tr , t 


/ 


Of course, we assume here that the parameters depend continuously on the 
distribution F. Further, the inequality 


Pr4||Fen — eet (t > 0) 


\ 


or the asymptotic equality 
Pr{lF..n — Sal)? < 8} = Pr{xiesy < 4nd°} 


can be used for the problem of fit when the specified distribution contains cer- 
tain (say, s) unknown parameters. This inequality and equality can easily be 
proved. For example, the last equality is obtained as follows. When 
F = (1, °°: , pe) with p; > O (@@ = 1, --- , k) and n is sufficiently large, we 
have 


F — S,|!? = x? /4n 
and 
Fein _ S,,|l” = Xen / 4n 


3 We can, further, prove that the convergence here is almost sure. See [4], [5]. 
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where x,, is x’ With the minimum x’ estimates of the unknown parameters re- 
placing the parameters. As x.,, has asymptotically the chi-square distribution 
with k — s — 1 degrees of freedom, we obtain the above asymptotic equality. 
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Summary. For i = 1, 2, --- , n, let N; independent trials be made of an event 
with probability p; , and suppose that the probabilities p; are known to satisfy 
the inequalities p, 2 pe 2 --- = p,. Let a; denote the number of successes in 
the i-th trial, and p? the ratio a;/Ni(i = 1, 2,---, n). Then the maximum 
likelihood estimates ji, --- , ~, of the numbers p;,--- , p, may be found in 
the following way. If pi = p2 = --- = px = 0, then p, = pi: ,i = 1,2, ---,n. 
If pt S prs: for some k(k = 1, 2,---, nn — 1), then py = x41; the ratios 
pe = a,/N, and p41 = Ge41/Ni41 are then replaced in the sequence pi , 
p2 , °°: , Pa by the single ratio (a, + ax4:) / (Nz + Ni4s), obtaining an ordered 
set of only n — 1 ratios. This procedure is repeated until an ordered set of ratios 
is obtained which are monotone non-increasing. Then for each 7, p; is equal to 
that one of the final set of ratios to which the original ratio a;/N,; contributed. 
It is seen that this method of calculating the jp: , --- , j, depends on a grouping 
of observations which might very well appeal to an investigator on purely 
intuitive grounds. It seems of interest to note that it yields the maximum likeli- 
hood estimates of the desired probabilities. 


Particular examples of this situation are found in bio-assay [3] and in the 
proximity fuze problem discussed by M. Friedman ({1], Chapter 11). 


The last section is devoted to a consistency property of the maximum likelihood 
estimators. 


1. Introduction. In ordinary sampling one observes directly values of a random 
variable. There are, however, certain investigations, of which examples are to be 
found in a number of different fields, in which the result of each observation is 
not a sample value of the random variable being tested, but only a number, 
together with the information that the sample value is less than, or is greater 
than, that number. Bio-assay furnishes an example ([3]; for further references 
see [3], p. 416; [1], p. 352). Certain other examples occurring in the biological 
sciences have been suggested to the authors. Still another situation of this kind 
is mentioned by M. Friedman ({1], Chapter 11). Given a population of proximity 
fuzes, one is interested in the distribution of the random variable t, maximum 
distance from target at which a proximity fuze will operate. The result of a test 
of an individual proximity fuze is the distance of its nearest approach to the 
target and the information that it did or did not operate (we assume that the 
proximity fuze will not operate before reaching its point of closest approach to 
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the target. We do not know whether or not this is in fact true of any actual 
proximity fuze; ef. [1], Chapter 11). The result of such an observation is there- 
fore not a sample value of the random variable, t, but rather a distance, t , and 
the information that the sample value of t corresponding to the particular prox- 
imity fuze is less than & (if it did not operate) or at least to (if it did operate). 

Let F(t) = Pr{t < ¢}; F(é) is the distribution function of the random variable 
t. Let p(t) = 1 — F(t) = Pr{t = ¢}; p(t) represents the probability that the 
fuze will operate if its minimum distance from the target is ¢. Suppose R fuzes 
are tested, and observed to pass within distances t,, t2,--- , ¢, of the target 
(ns R; several may have the same minimum distance from target); for con- 
venience Sippose the {t;}/ are arranged in increasing order. The R tests may be 
regarded as a set of R independent trials of events having probabilities 
pi = p(t:)(i = 1, 2,--- , n) of success (those observed at the same minimum 
distance from target having the same a priori probability of operating), if the 
term “success” is used to signify that the proximity fuze operated. The problem 
is to estimate the probabilities {p,}? from the results of the R trials. 

In a typical bio-assay situation, a large number of trials is made at each pa- 
rameter value ¢,(¢ = 1, 2,---,n). In such a situation the ratios, number of 
successes divided by number of trials, each determined for a particular pa- 
rameter value, will with high probability be in monotone non-increasing order 
(assuming 4; S & S --: S t,). The “best” estimates of the probabilities are 
then these ratios, and if f(t) is a non-increasing function assuming these values 
at the points {¢;}} then F(t) = 1 — p(t) is an obvious empirical distribution 
function. In other cases, such as that discussed above, one might expect a small 
number of trials corresponding to each parameter value, so that the average 
numbers of successes could not be expected to be in monotone order. It is for 
such situations that the maximum likelihood estimators of the probabilities 
{p(t;)}i are determined in this paper. If {p;}7 denotes the set of maximum likeli- 
hood estimates, and if p(t) is a monotone non-increasing function such that 
p(t:) = pi = 1, 2,---, n) then F(t) = 1 — p(t) will be termed an empirical 
distribution function. 

In bio-assay situations it is often assumed that the random variable in question 
(perhaps after an elementary transformation) is normally distributed. Methods 
of probit analysis ({1], [2], [3]) have been developed for use with such an assump- 
tion. While it is true that an empirical distribution function may be useful in 
determining parameters of a normal distribution under such an assumption, 
the primary purpose of this paper is to present estimators of the probabilities 
{p(t;)}i without reference to any assumption as to the distribution of the random 
variable being tested. These estimators are derived in section 2. The calculations 
required for their computation are extremely simple and rapid. In section 3, 
the consistency of the estimators is considered. A theorem is proved which states 
that the empirical distribution function, F(t) = 1 — p(t), converges in proba- 
bility to the distribution function F(t) as the number of tests or trials becomes 
infinite in an appropriate way. 
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2. Maximum likelihood estimators of the probabilities. Let a; + b; inde- 
pendent trials be made corresponding to the same parameter value, or observa- 
tion pomt, t; , of which a; are successes (i = 1, 2, --- , n). If p; = p(t;) denotes 
the probability of success in a trial corresponding to the parameter value 
tt = 1, 2,---, ), then the a priori probability of the event that, for each 
integer 7, 1 S i S n, a specified a; of the trials will result in success is 


(2.1) IT = II pita - po” 
Since p(t) is non-increasing, the {p;}7 are known to satisfy the relations 


(2.2) 12ma2me2::-2p,2 0. 


The maximum likelihood estimates of {p,;}? are those numbers, {j,;}7, which 


maximize the probabilit: Il subject to the relations (2.2). (These estimates 
also maximize the probability, 


Il P + . pri(l Ay py”, 
i=l a; 

that for each 7, 1 S i S n, there will be a; successes among the a; + ); trials 
at the observation point /;). 

In the context of the above discussion the numbers a; and b,(7 = 1, 2, --- , n) 
are non-negative integers. In section 3 they will be so regarded. However, the 
discussion of this section requires only that they be non-negative real numbers, 
such that a; + b; > 0 (@@ = 1, 2,---,n). 


Let $,, denote the class of sets of real numbers {p,}7 satisfying the inequalities 


(2.2). The problem is to determine a set {;}7 in §, affording a maximum value 
to 


(2.3) Il 2 p'(1 — p’)’* = max [J pi — p')' 
t=1 {pajeB, t—1 
Lemma 2.1. There is a maximizing set {p,}7 . 
This follows immediately from the observation that the product is a con- 
tinuous function of its arguments p: , pe, -:- , Pp, and hence assumes its maxi- 


mum on the closed, bounded set described by inequalities (2.2). 
Set 


(2.4) pi = a;/(a; + b;) 


TueoreM 2.1. If {p,}i is a Toe sel, and if pr > psi for some k, 
l1sk<sn,thenp =i > Pry 2 > pry. Also, pt S p,, and px = pr. 

Proor. We prove first that pp = pj, . The basis of the proof is the observation 
that the function pl — p)’ increases for 0 Ss p < a/(a + b) and decreases 
fora/(a+b)< pl. Suppose p. > pr. Choose p, = max (pr , frsi). Then 
Pi 2 po 2 +--+ = Pra> Di => peri 2 +--+ = Pn, While 


p.*(1 — pr)* > pet(1 — px)”. 
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This means that |] is increased by replacing j: by p; (note that max [] > 0), 
contrary to (2.3). Therefore pp = jx. Similarly pry, = pri. Hence 


pr = Pr > pes: = pes. The proof of the last statement of the theorem is 
similar. 
For integers r, s, with 1 S r S s S n, define 


(2.5) a(r,s) = > a, B(r,8) = yp b,, 


A(r, s) = a(r, s)/[a(r, s) + B(r, s)). 


THEOREM 2.2. For 1 S 1 S n, we have 


pi = min max A(r, s) max min A(r, s) 
lsrsi igssgn igesgn Igrsi 


min max A(r,s) = max min A(r,s). 

lsrsi rsssn iSssn lSrss 
The original proof, based on Theorem 2.1, is omitted. The reader is referred to 
the following paper for a simpler proof. 

Coro.uary 2.1. The maximizing set {p;}i is unique. Each p,(i = 1, 2, --- , n) 
is determined uniquely by any of the formulas in Theorem 2.2. 

Theorem 2.2 gives explicit formulas for the determination of the {p,;}, but 
these are not recommended for calculation. Theorem 2.1 provides a means of 
calculating the maximizing set, {j;}7 , as outlined in the summary, which is 
very fast even for moderately large n. 

The following interesting inequality was mentioned by a referee: 


> (pr _ De) (ae +h) 2 > (pe — pr) (a + b;). 


* i - ‘ 
Here p; and j, are as defined above, while p; , po, --* , Px is any set of numbers 


*?) 


such that 1 = p; = po = --- = p, 2 O. Indeed, one has 


> (pe - Px) (a + h) 2 > (pe — Pu) (ax + bk) + > (pi ~ p)*(a, + b,), 


as was shown by two of the authors, independently, in more general contexts, 

subsequent to the submission of the manuscript. These inequalities show that the 

numbers j; are, on the average (in an obvious sense), closer to the numbers 
. * 

p,. respectively than are the numbers p; . 


3. The consistency of the estimators. Let F(t) be the distribution function of 
the random variable t (see Section 1). The probability that t will assume a value 
t or greater is given by p(t) = 1 — F(t). The method discussed in Section 2 pro- 
vides the maximum likelihood estimates, j;, of p(t) at specified parameter 
values, or observation points, t;,(¢ = 1, 2,--- , m). Let p(t) denote any non-in- 
creasing function, 0 < A(t) 1, assuming the values 7; at the points 
t(i = 1, 2,---, n), and F(t) 1 — p(t) an empirical distribution function 
associated with trials at the observation points ¢;(i = 1, 2, --- , n). If the points 
tf, , -++ , t, were to remain fixed and the number of trials at each to increase in- 
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definitely, it would follow from the strong law of large numbers that for 
k = 1,2,---,m, pe and pj, converge with probability 1 to p, . In the following 
theorem, however, neither 7 nor the points ¢, , tg, --- , t; , nor hence the proba- 
bilities p, need remain fixed. For a fixed t , the number of trials made at t) need 
not become infinite, nor need any at all be made at é . We shall have f(t) near 
p(to) with high probability if only enough trials are made at points near t , even 
if only one trial is made at each point. 

The following theorem of Kolmogorov (strong law of large numbers) will be 
useful in establishing such a result. 

Lemma 3.1. (Kolmogorov) Let y; be a sequence of independent random variables 
having expected values E(y;) and variances V (j = 1, 2, ---). Let € be an arbi- 
trary positive number, and M a positive integer. Then 


| 


(3.1) Pr sup | : a ly; — Evy;)] | = ‘ > 


j=] 
({4], p. 203). 
THEOREM 3.1. Let to be a continuity point of the distribution function F(t). Let 
e, » be arbitrary positive numbers. Let t’, t” be chosen so that t' < to < t” and so 
that |F(t) — F(to)| < «/2 fort’ st s t”. Then 
(3.2) Pr{|F(t) — F(to)| < e} >1 — 7 
provided that at least N trials are made between t' and to and at least N trials are 


made between to and t”, where N is chosen so that 


1 


: — ] 4 
(3.3) 2st ay < 0/32. 


N 
Proor. We shall prove first that Pr{F(t&) > F(t) — «} > 1 — n/2 or 
(3.4) Pr{ p(t.) < p(t) + «} > 1 — 7/2 


jon j 


provided that, at least N trials are made between ¢’ and ¢ . It can be shown simi- 
larly that Pr{ F(t) < F(t) + «} > 1 — /2, or 


(3.5) Pr{ p(t) < p(t) — e} > 1 — 0/2, 


provided that at least N trials are made between ¢ and ¢”. Inequality (3.2) 
follows from (3.4) and (3.5). 

In order to establish (3.4), let ¢* = t if é is an observation point. If not, let 
‘* denote the first observation point to the left of %. Since F(t) = F(t*), or 
p(t.) < p(t*), it suffices to prove 


(3.6) Pr{p(t*) < p(t) + «} > 1 — 9/2. 


Let the observation points be {t;}(¢ = 1, 2,---,n) with S48 °-::-S4. 
Let tm be the first observation point to the right of ¢’. Let M be the number of 
trials at observation points tm, tmii,°*:, t. = ¢*. By hypothesis, M 2 N; 


that is, iin (a; + b;) = N. Order the trials at observation points tm ,tmsi, °°" ; 
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t, in the order of increasing ¢; , ordering in an arbitrary way those occurring at 
the same observation point. Let T,, T:,---, Tu, Twai,--:, Te, where R 
is the total number of trials, denote the trials so ordered. Let {y;} denote the 
number of successes in the trial 7 (j7 = 1, 2, --- , R) so that y; = 1 with proba- 
bility p(t;) and y; = 0 with probability 1 — p(t;), where ¢; is the observation 
point at which the trial 7; occurs. For 7 > R, let {y;} be independent random 
variables, each assuming the value p(to) with probability 1. Set s. = > fiy;. 
By Theorem 2.2, 


= min max A(r,s). 
lgrgu usessgn 
Hence 
p(t*) < max A(m,s) 
us*sn 


(tm is the first observation point to the right of t’). The symbol A(m, s) represents 
the average number of successes in trials starting at ¢, and terminating at ¢, 
Hence as s varies (s 2 u) these ratios form a subsequence of the sequence 


s,/k(k 2 M). This implies that 


(3.7) p(t*) S sup sx/k. 


k>M 
By Lemma 3.1, 


\ 


ae 
2) 


i ie on | s 1 
Pr sup k =5 > Ey) | < <> > Pr ( sup ; rs 2. E(y;) 
k v j= v4 k2>J t jm 


k>M 


64 
>1- 8) > vis Paar 


But V; = Var(y;) = p(t,)[l1 — p(t] S 4, t being the observation point at 
which the trial 7; occurs. Hence by hypothesis (3.3), 


> « Bt aac /9 
Pr{aue | f - ty Bo | s s} > ' Let + al? - 


since M = N. Further, if 1 S 7 S R, then E(y;) = p(t:) < p(toc) + €/2;if7 > R, 
then E(y;) = p(t). Hence 


Pr {sup si/k < p(t) + «} > 1 — 9/2 
k>M 
By (3.7) it then follows that 
(3.6) Pr{p(t*) < p(t) + €} > 1 — 0/2 


The proof of Theorem 3.1 is completed as indicated immediately following its 
statement. 
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ON THE NONCENTRAL BETA-DISTRIBUTION 


By J. L. Hopess, Jr. 
University of California, Berkeley 
1. Summary. A computing formula for the noncentral beta-distribution 
recently published by Nicholson is simplified. A table of figurate numbers is 


provided to facilitate the use of the simplified formula. It is compared for ease 
of use with the method of Tang in several situations. 


2. Simplification of Nicholson’s formula. Let Y: — 6:,---, You — O20, 
Z,,-+-, Zw» be independent normal random variables of zero expectation and 
unit variance. Let S= }> Yi, T = 55 Z3,2\ = >) 6. If \ = 0, the random 
variable X = T / (S + T) has the beta-distribution Pr(X < x) = J,(b, a). We 
shall say that X has the noncentral beta-distribution if \ > (, and in general 
denote Pr(X S x) by B(z; a, b, X). 

Nicholson [1] has recently derived a closed expression for B in case } is an 
integer. In our notation, his result is 


(1) B(z;a,b,s) =1—e mates z(a, b) + (1 — > [x(1 — =)n(P,/j))} 


j=l 


where 





(2) P, => [cn vi eth Bett <8. FD a 2) 


6—j-D!i@t+j+h 
It should be noted that our z, in conformity with the notation of [2], [3], is the 
complement of the x used by [1], [4]. 


A main purpose of this paper is to point out the simplification effected in (2) 
if it is expressed as a polynomial in z instead of (1 — 2). Since 


(1 — 2)! =r cm (F)e’ 


t=0 


ken 


the coefficient of x‘ in P; is 
_ .: me -j- ei yes — 1I)\(a+b—2)--- @+i) 
k t Oj = Diats TD 


_ @+b-N@tb—2 ---@+N" HS" 
es — bO-—-j-—t—-Ditl “=o 


(OFT) /@titete, 








(~a)" 
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If we now use the easily checked identity 


(4) yo (2)/ +0 =1ve(*t") 


we have 


b—j-1 
(5) P; = 2d (4: th A=a+j-1. 
t= 
A comparison of (5) with (2) shows the greater simplicity of the new form. In 
particular, the coefficients now depend on but two arguments and are readily 
tabled. If a is an integer, the coefficients are the widely available combinatorials; 
if a is a half-integer, they are figurate numbers. A table of (7) to 7 significant 
figures is provided for A = .5(1)19.5, ¢ = 1(1)18, adequate for computing all 
of the P; when 2b < 40, 2a + 2b S 43, and for the initial P; for larger values 


of 2a. 


3. Comparison with Tang. The formula (1) may be regarded as the summed 
form of a recursion formula due to Tang [4]: 
b-—1 


(6) B(x; a,b,4) =1—¢*(1 — 2) dX T; 
ja 
where 
T) = 1, T, = —— (tl —xzr»+a+b-]] 
(7) Z 
ar) 
The choice between the polynomial method (1), (5) and the recursion method 
(6), (7) will depend on just what computation is in hand. 

(i) Consider first the problem of computing an isolated value of (1), cor- 
responding to given values of x, a, b, \ not covered by existing tables and charts. 
This is the most familiar use of the noncentral beta-distribution as it provides 
values of the power of the analysis of variance test. It corresponds to the tables 
of Tang, and also arises in computing the power of the test of the hypothesis 
that \ < Xo. The polynomial method (1), (5), even with the coefficients of (5) 
available, involves (b° + 3b — 4)/2 multiplications and divisions. Recursion 
method (6), (7) requires 4b — 2. Thus, Tang’s method appears to be superior 
for b = 6, with the advantage increasing rapidly with b. 

The comparison just made is however not quite fair to the polynomial method 
for several reasons. The computation of (5) is somewhat simpler than that of 
(7), so that it proceeds more rapidly with less risk of a mistake. Further, a re- 
cursion computation is particularly subject to the accumulation of error, so 
more figures must be carried with (7). Finally, it may not be necessary to com- 
pute all of the P; , as the tail of the sum in (1) may be negligible. Nicholson has 


{a —xzrA\ +a+b—- Tj + «XT ;-2). 
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given a A-uniform bound to the error committed in neglecting this tail, though 
in fact the smaller \ is, the fewer terms are needed. With (7), however, all terms 
are needed even when \ = 0. 

This last advantage of the polynomial method is not as great as it may at first 
seem, since the neglected P; are just those easiest to compute. To illustrate this 
point, consider the example given in [1], with b = 15, where the first 8 P; are 
adequate for two-figure accuracy for all \. Here, 103 multiplications are required 
to calculate all of the P; , of which only 15 are saved by neglecting those beyond 
P,. The entire truncated computation requires 113 multiplications and divisions, 
compared with 60 needed with Tang’s formula. However, for small A, fewer 
than 8 P;’s would suffice. 

As a rough summary statement, the polynomial method should be chosen 
for the direct computation of an isolated B if either b or \ is small; otherwise, 
the older method will be quicker. 

(ii) Consider next the problem of finding x corresponding to given a, b, i, 
and B. This problem arises for example, in determining the critical value of the 
test for the hypothesis \ S Xo, a problem met by the Incomplete Beta-Function 
tables if Xx = 0. It is well known that the noncentral beta-distribution provides 
the significance levels of this test. As (1) or (6) cannot be explicitly inverted, we 
must essentially calculate B for several values of x, with a, b, \ = Xo fixed, and 
then interpolate. The comparison between the two methods is as before, with 
Tang’s preferred unless b or \ is small. 

(iii) Finally, consider the problem of finding \ for given a, b, x, and-B. This 


arises if we seek to find the smallest \> for which on the basis of given x we 
should accept the hypothesis \ S Xo, or if we wish to provide confidence intervals 
for \. It is the problem solved in computing the Lehmer tables. As in (ii), we 
must calculate B for several values of \ with a, b, x fixed. For this purpose method 
(1, 5) will usually be much superior to (6, 7), since we can compute the quantities 
Q; = (1 — 2x)"[x(1 — 2)]’P;/j! once and for all, as they do not involve \. Then 
it is relatively easy to find 


B=1- {hala b) + >> ann’) 


for several values of X. 


REFERENCES 

[1] W. L. Nicnotson, ‘“‘A computing formula for the power of the analysis of variance test,’’ 
Ann. Math. Stat., Vol. 25 (1954), pp. 607-610. 

[2] Emma Leumer, “Inverse tables of probabilities of errors of the second kind,’ Ann. 
Math. Stat., Vol. 15 (1944), pp. 388-398. 

[3] C. M. Tuompson, ‘‘Tables of percentage points of the incomplete beta-function,”’ 
Biometrika, Vol. 32 (1941), pp. 168-181. 

[4] P. C. Tana, ‘“‘The power function of the analysis of variance test with tables and illus- 
trations of their use,’’ Stat. Res. Memoirs, Vol. 2 (1938), pp. 126-149. 





- wow 


ett 
PNA AeEWNHeK OCT ON OM 


Par 


esos 
eae ie 
Wwrnnbd 


oe 


NONCENTRAL 


5 
-1.787 


4.468 


496 


804 : 
3.157 


.187 
.105 
.215 
.762 


2.484 
3.415 


.597 
.075 
.898 


-1.011 


.279 


599 


451 
241 
064 
760 
852 
018 
525 
822 
693 
401 
983 
860 
825 


Table of ( att 


0 


1.5 


95 


“.t 


04.375 


0 
0 


1 
1 
J 
1 
1 
1 
| 
1 
1 
1 
] 
1 
1 
I 


= 


~ 
| 


noe 


& WWW WwW WS bo te 


6.562 ! 


-9 .023 
173 
466 
.780 


2.114 362 


2.466 
2.836 
3.223 
3.626 
.045 
478 
.926 
5.388 
5.863 


». 352 


oO 


wm H to 


— et 


to 


| 
ow wo 


~~ SS 


A 


BETA-DISTRIBUTION 


het 


to to to 


bo ho to bo 


to to 


—_—sS 


Y GP Ge be bo to 


i we 
| 


ae 


oof eS SS 
a 
So 


uo 
i) 


awn SK Nw 


~ 


bo 
nr WOWNHN Ww = — 


ost 


GS wo 


i 
—-) -& w= 


© > 





= 
S 


whwnwnyye ee 
; 
Noe ww = 


to 
i 


Ww bo 


| 


| 


weoow s 


| 


~ 
_— > 


t 


y 9 Go bo bo 
~~ wo Se SS © 


n ¢ ~~ re & S |! 
CELA e 


or to 


ow 


ow ow 
' | 
So i SF 


“6.0 
.087 5 
413 
.062 
O15 
2.285 

.733 
9.171 
.681 
.942 
.948 
.042 
. 268 
.947 
2.921 

. 290 

.183 


684 
288 
811 
759 
489 
606 
928 
008 
629 
536 
304 
665 
605 


5-8.760 107 





J. L. HODGES, JR. 


rn f +t * 
lable of (47')—(Continued) 


8.5 
987 ! 
911 
.974 
.613 
.898 
.632 
780 
461 
>. 404 
1.135 
.939 
207 


—_oO | 
| | 
WOe Owe ae ko 


— 
‘ 


— or = ih COM 


~ 


tds 


bo 
| 


@ bo 
| 


013 

9 821 
313 
838 
451 
315 014 
107 

522 432 
598 002 
35 647 
456 


72 Pp 


400 


593 


—- Ww 
ow 
! 


| 
Cr GR me He GO GO OO 
Pe A ee 
ot = ob 
Ce i i OO 
| 
no 


Gt > We Hm GO Go go bo to 
| } 
om 


> G& or 


qs 


gr 
| 
aw = 


me 


PEP HY 
oo 
noe = 
a> a 
os 
Noe oon = 


.35 

9.787 5 

5.056 875 
3-2.085 961 
3-7 .300 863 
4-2.251 100 
4-6 .270 920 ¢ 
5-1.606 923 5-2.559 
5-3.838 761 397 935 
5-8 .637 213 5-1.503 515 
6-1.845 223 6-3 .348 737 
6-3 .767 330 i-7.116 067 
6-7 .389 762 .450 583 
7-1.398 776 .849 359 
7-2.564 423 5.413 783 
7-4.567 879 .981 662 
7-7 .926 614 .790 828 
8-1.343 121 .133 948 


_ 
| 


443 
3.903 
340 
.779 
5.376 
2.049 
.020 
.638 982 
.179 403 
.006 856 
.307 379 
.058 484 
.065 895 
.167 320 
.266 911 
.157 329 
.518 170 


Ww bo bo 
} 
—_ hr Oe 


7 


Crh & Cl 


we 

' 

oO 
ogmu NS =! 
SS2N8ua 


CTT 

' 

He oh ee oS bo 
oe | 

~J 


ee 


-— =] 
“1-3 +7 
| i | 


2 


Serna a 7 

i 
Oe © ot 
© 0 0 O @ 
ret? 
_ — bo 





NONCENTRAL BETA-DISTRIBUTION 


Table of & t*)__ (Concluded) 


A 
17.5 18.5 19.5 

| 1-1.85 95 
75 2-1.803 7: 2-1.998 75 
188 3-1.232 562 3-1.432 438 
5.392 461 |  3-6.625 023 3-8.057 461 
758 | 4-2.981 261 ~3.787 007 
5343 | 5-1.167 660 5-1.546 361 
151 | 5-4.086 811 5-5.633 172 
900 6-1.302 671 }-1.865 988 
972 6-3.835 643 5.701 631 
7-1.054 802 | 624 965 
7-2.732 895 7-4.357 860 
7-6.718 368 8-1.107 623 
8-1.576 232 8-2.683 855 


—_ 


05 

.203 75 

.652 812 
.710 273 
.758 034 
-022 164 
-655 337 
.631 522 
.333 153 
458 280 
816 140 
.789 237 
-473 092 


e 
m bo tb 


i aee 
_— © 


P31 QD or Se W bo 
oS T7 
@onwn bw 


| 
~ 


7-1.678 094 
7-3.985 473 
7-9 .043 957 


a 
ds 


tt 


84.137 610 
8-8.404 521 
9-1.656 185 
9-3.174 355 


8-7.684 133 9-1.391 451 | 9-2.461 798 
9-1.608 865 | 9-3 .000 317 9-5 .462 115 
9-3.265 050 | 9-6 .265 367 10-1.172 748 

405 10-1.270 477 10-2.443 225 


| 
| 
| 
| 
| 
8-1.970 291 | 8-3.546 523 | 8-6.230 378 9-1.070 347 
| 





ON TRANSIENT MARKOV PROCESSES WITH A COUNTABLE 
NUMBER OF STATES AND STATIONARY — 
TRANSITION PROBABILITIES' 


By Davin BLACKWELL 
University of California 
1. Summary. We consider a Markov process x , 7; , --+ with a countable set 
S of states and stationary transition probabilities p(t | s) = P{aayi = t|2, = 8}. 
Call a set C of states almost closed if (a) P{x, ¢ C for an infinite number of n} > 0 
and (b) x, ¢€C infinitely often implies z, ¢C for all sufficiently large n, with 
probability one. It is shown that there is a set (C, , C2, --- ) essentially unique, 
of disjoint almost closed sets such that (a) all except at most one of the C; are 
atomic, that is, C; does not contain two disjoint almost closed subsets, (b) the 
non-atomic C;, if present, contains no atomic subsets, (c) the process is certain 
to enter and remain in some set C;. A relation between the sets C; and the 
bounded solutions of the system of equations 


(1) a(s) = >, a(t)ple 8) 


is obtained; in particular there is only one atomic C; and no non-atomic C, if 
and only if the only bounded solution of (1) is a(t) = constant. This condition 
is shown to hold if the process is the sum of independent identical (numerical 
or vector) variables; whence, for such a process, the probability of entering a 


set J infinitely often is zero or one. The results are new only if the process has 
transient components. The main tool is the martingale convergence theorem. 


2. The structure theorem. 

THEOREM 1. Let 2, 2, --- be a Markov process with a countable set S of states 
(we restrict S to those states with a positive probability of being entered) and station- 
ary transition probabilities. For any subset I of S, denote by L(I1), U(1) the events 
lim inf {z, ¢ J}, lim sup {x, ¢ J} respectively, by Mm the class of I with P(U(1)) = 0, 
and by @ the class of I with L(I) = UUW) ae. If C ¢ @ and C eM, C will be called 
almost closed. 

(1) Call a Borel measurable function f on the space Q of all infinite sequences 
w = (x, %1,°** ), tn€S, invariant if for every w, f(w) = f(Tw), where 
T(xo, %,°*** ) = (a1, %2,-+-+ ), and call an event invariant if its characteristic 
function is invariant (so that, for any J C S, L(J) and U(J) are invariant). For 
any invariant event V there is a C ¢ @ such that U(C) = V a.e., so that the Borel 
field of invariant events is identical, up to events of probability zero, with the (Borel) 
field D of events of the form D = U(J) ae., J CS. 

Received November 22, 1954. 
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(2) There is a finite or countable collection {C;, C2, --- } of disjoint almost 
closed sets with the following properties: 

(a) every C; except at most one is atomic, that is, does not contain two disjoint 
almost closed subsets, 

(b) the non-atomic C; , if present, contains no atomic subsets, 

(ce) >; P(L(C))) = 1. 

The collection {C,, C2, --- | is essentially unique, that is, if {Ci, Cs, +--+} has 
properties (a), (b), (c), then each C; differs from some-C; by a set in M. 

The C; may be chosen so that all states in a given C; are of the same type: either 
all states s in C; are transient, that is, {s} ¢ 90 or all states in C; are recurrent, that 
is, nontransient. The non-atomic C; , if present, consists entirely of transient states. 

Proor. We represent 2, 21, °-* , a8 usual, as coordinate variables on the 
space 2. For any bounded measurable f on 2, we have 


(4) E(T"f | 19 = &%, +--+, %_ = &) = E(f| 20 = &), 


where T"f denotes the function g defined by g(w) = f(T"w). For, since T"f de- 
pends only on 2, , 2n41, °** , the left side of (4) equals E(T"f | xz, = s,) which, 
because of the stationarity of transition probabilities, is easily shown to equal 
E(f | x9 = s,). Thus if f is invariant, 


(5) E(f | zo = 8,°°:,2n = 8) = E(f | xo = 8). 


For any invariant event V and any state s, define ¢(s) = E(v | 2» = s), where 
» is the characteristic function of V. From the forward martingale convergence 
theorem ([2], p. 319) and (5), ¢(z,) — v with probability one as n — «. Thus 
if J is the set of all states s with ¢(s) > 3, w ¢ V implies w ¢ L(J) with probability 
one, while w z V implies w g U(J) with probability one, that is, U(J) Cc V C L(J) 
a.e. Since always U(J) D L(J), we have J ¢ C and U(J) = V a. This estab- 
lishes part (1) of the theorem. 

For part (2), we decompose the measure P on the Borel field of invariant sets 
into atoms and a completely non-atomic part (see for instance [1], p. 565). Let 
Vi, Ve,--:+ be the sets of this decomposition, and choose 7, ¢ @ such that 
U(/,) = V, ae. Since V;, V; are disjoint for 7 # j, J; n J; eM for 7 ¥ 7, so 
that, with C, = J, — Uje, 1;,C, e @ U(C,) = V,, ae., and C,, C2, --- are dis- 
joint. Properties (a), (b), (c) of part (2) and the essential uniqueness are immedi- 
ate. The final assertion of part (1) is a consequence of the known ([(3], p. 322) 
facts that if {s} 29M, that is, if it is possible for the process to enter s infinitely 
often, then if the system ever enters s it is certain to enter infinitely often s and 
all states which can be entered from s. The latter class C of states is almost closed 
(in fact is closed, that is, if the system ever enters C it remains in C), consists 
entirely of recurrent states, and has no almost closed proper subsets. This com- 
pletes the proof of the theorem. 

Any collection of sets {C;, C2, ---} of the form described in part (2) of 
Theorem 1 will be called a decomposition of the Markov process. A process will 
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be called simple if its decomposition consists of a single set C;, and a simple 
process will be called non-atomic or atomic according to the type of C; . 


3. The equation a(s) = ) > a(t)p(t|s). For a process of the type considered 
in Section 1, write p(t|s) = P{aay. = t|a, = 8}, so that p(t|s) = 0, 
>>: p(t | s) = 1 for all s. The structure of the process is closely related to the 
nature of the bounded solutions of the equation 


(6) a(s) = >, a(t)p(t | s). 


This relation is most simply stated in terms of the bounded invariant functions 
of the process, as follows: 

THEOREM 2. For any bounded invariant function f, a(s) = E(f | xo = s) satisfies 
(6), and every bounded solution of (6) may be represented in this form. 

Proor. That any a(s) = E(f|zo = s) satisfies (6) is clear, since 
E(f\z = s, %, = t) = a(t) from (5), and (6) then follows from the formula 
E(f | 20) = E(E(f | 2x0, 21) | 20). Conversely if a(t) is any bounded solution of 
(6), the sequence z, = a(z,) is a bounded martingale and hence converges to a 
limiting bounded f. Since z,(Tw) = 2nis(w), f(Tw) = (fw) a.e., that is, f is in- 
variant. The martingale convergence theorem also yields E(f | xo) = zo = a(2), 
so that the solution a(t) has the required form. 

The inequality 


(6’) a(s) = Dir a(t)p(t | s) 


has been studied by Kendall [7] and Foster [4], who related the existence of 
solutions of (6’) such that a(j) + © asj— © (enumerating the states by the 
positive integers) to the existence of a finite closed set of states. 

Corouuary. The process is simple and atomic if and only if the only bounded 
solution of (6) is a(t) = constant. 

Proor. If the only bounded solution of (6) is constant, then any bounded 
invariant f is constant, since, with a(s) = E(f| 20 = s), a(t.) ~faeasn— o. 
Thus any invariant set has probability zero or one, and the process is simple 
and atomic. Conversely, if the process is simple and atomic, every invariant set 
has probability zero or one, every bounded invariant function f, being measura- 
ble with respect to the class of invariant sets, is constant a.e., so that every 
solution of (6), having the form E(f | 7» = s), is constant. 

As an application of the corollary, we consider processes which are sums of 
independent identically distributed variables. 

TueoreM 3. Let V be a finite or countable set of vectors v; , v2, --- in N-space, 
and let p; , po, «++ be positive numbers with sum one. Let S consist of the origin and 
all vectors representable as w, +--+ + w,, n = 1, 2,---, wyeV. The only 
bounded solution of the equation 


(7) a(S) = > \;a(s + 0,)p;, seS 
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is a(s) = constant. Consequently if 4:1, ye, +--+ are independent variables with 
P\yn = v;} = p;, the Markov process x = yi; + -+- + yx, % = (0,--- , 0) ts 
simple and atomic. 

Proor. Repeated use of (7) yields 


als) = D2 als +i, +--+ + 0%) Pio Din 
Jis** "Ik 
(8) 
De als + rit + rov2 + ---)qelr), 
reR;y 
where R;, consists of all sequences r = (ri, , r2 , -- +) of nonnegative integers whose 
sum is k, and 


quer) = RATT (954/75 }). 
Replacing s by s + », and k by k — 1 in (8) yields 
a(s + v;) = a a(s ++ no + -- *)qu_a(r) 


(9) reRpe} 
> als + ivi + reve + -++)q,(r)(11/kpi). 


reRy 


Subtracting (9) from (8) yields 
(10) a(s) — als +) = Do f(r)q(r)(1 — (ri/kpy), 


reRy 


where f(r) is for fixed s and k a bounded function of r and is uniformly bounded 
in s, k as well, say |f(r)| < M for all s, k, r. For fixed « > 0, let 7). denote that 


subset of R, for which |1 (ri/kp,)| < «. Then 


DY |) | wr) | 1 — (n/kp) | S$ Mr Dae), 
rgT% r£T* 
and the sum on the right, being the probability that, in k independent trials 
with an event of probability p, , the actual success ratio differs from p,; by at 
least ep; , approaches zero as k — o by the law of large numbers. Since 

a \f(r)| qe(r) [1 — (ri/kp,)| S Me, 

reT; 
we find from (10), summing separately for r ¢ 7; and r g T; and letting k > «, 
that ja(s) — a(s + »)| S Me. Since « is arbitrary, a(s) = a(s + »). Clearly 
the same proof yields a(s) = a(s + v;) for any j, and a(s) = constant. 

Corotuary. If y:, ye, °-* are independent identically distributed (vector or 
scalar) variables with a finite or countable set of values, for any set J the probability 
that an infinite number of sums y; + +--+ + Ye = Xe are in J is zero or one. 

An example of a simple nonatomic process is x, = } + cf «,/2”*’, where 
1, €2, °** are independent and assume the values +1 with probability 4 each. 
The states are all rational numbers m/2”, m = 1,3, --- , 2” — 1,n = 1,2,--- 
Writing m/2" = (m, n), the system (7) becomes 


(11) a(m, n) = 3(a(2m — 1,n + 1) + a(2m + 1,n 4+ 1)). 
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If «* = limz,, 2* is a.e. invariant and has a uniform distribution on (0, 1). 
Each 2z, is a function of z* a.e., so that every function of 2, 2%, %2, ++: is a.e. a 
function of z* and hence invariant a.e. For any bounded function f(z*), 


(m1) /2" 
E(f|a— = (m,n)) = 27 [  f) dt = almn). 
(m—1) /2 
It is easily verified by substitution that a(m, n) satisfies (11). 

The Corollary actually holds without the restriction to random variables with 
a finite or countable set of values; this is an immediate consequence of an un- 
published theorem of Edwin Hewitt and L. J. Savage, which they communicated 
to the writer. The Hewitt-Savage Theorem, an improved version of the zero-one 
law, asserts that any event depending on a sequence of independent identically 
distributed random variables which is invariant under all permutations of every 
finite set of the random variables has probability zero or one; the event that an 
infinite number of sums y; + --- + y are in J is clearly of this type. The con- 
clusion of the Corollary, under different hypotheses on the y,;, has also been 
obtained by Chung and Derman in an unpublished manuscript which they com- 
municated to the writer. 

As a second application of the Corollary of Theorem 2, we obtain an interesting 
result of Foster [4], Harris [5], and Hodges and Rosenblatt [6] concerning the 
random walk on the nonnegative integers, with p(0|0) = 1, p@ + 1/7) = pi, 
pei—1l}t)=q=1-— p;,0 < p; < 1,7 > O. The equation (6) becomes 


(12) a(s) = p.a(s + 1) + g,a(s — 1), s\>.Q 
The general solution of (12) is a(s) = A + Bz,, where ~ = 0,2; = 1,2 = 


“a 


(: + -:: +, fors > 1, wherec; = 1/p1,¢; = (@ ++: Qi/pi ++: p,) forj > 1. 
Thus (12) has a bounded nonconstant solution if and only if the series 


Di (qi +++ a3) / (r++: pi) 


converges, which is the condition obtained in [4], [5], and [6] for passage to the 
origin to be uncertain. 
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ON PARAMETER ESTIMATION FOR TRUNCATED PEARSON 
TYPE III DISTRIBUTIONS 


By GrorGe GERARD DEN BROEDER, JR.' 


Purdue University 


Summary and introduction. The problem of estimating the parameters of the 
Pearson Type III probability density function (p.d.f.) 


| O<t 
o(t, a) = of(at) = [T(p) Je? 1, <a 
0O<p 


assuming various forms of truncation has been considered recently by A. C. 
Cohen, Jr. [2], Des Raj [4] and others. In this paper we obtain maximum likeli- 
hood estimates of the parameter a when p is known apriori. Truncation is at a 
known point 7’ > 0. 

Four cases are considered: truncation to the right of 7 with the number of 
observations in the region of truncation (1) known, and (2) not known; and 
truncation to the left of T with the number of observations in the region of 
truncation (3) known, and (4) not known. The information lost in not knowing 
the number of observations in the regions of truncation is measured in terms of 
the R. A. Fisher indices of information. 

The study of cases (2), and hence of case (1), is an outgrowth of the author’s 
experience with a life testing program from which, unfortunately, data had been 
recorded only for those specimens which failed within 100 hours of testing. De- 
spite this anomaly of experimental design a maximum likelihood estimate of a 
was found to exist. The analysis proceeded on the assumption of an exponential 
failure law (p = 1). 

Another instance of case (2) arises naturally in connection with the distribution 
of population in urban communities. Colin Clark [1] has found urban population 
density, as a function of radial distance from city center, to be adequately de- 
scribed by the Pearson Type III p.d.f. with p = 2. The maximum likelihood 
estimate of the unspecified parameter, a, for cities with circular peripheries, is 
contained in Section 2. 

Cases (3) and (4) are included for the sake of completeness. A possible area of 
application is to be found in the field of telemetry where frequently the result of 
a random experiment is measured by an instrument which responds only to 
inputs in excess of a fixed magnitude. 


1. Number of observations to the right of 7’ known. Suppose we have a ran- 
dom sample of size N from the Pearson Type III distribution with known p. 
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Suppose further that N — n observations are known only to be greater than 
T > 0 while the actual values, ¢, < 7’, of the remaining n observations are given. 
The likelihood function of the sample is then 
({1 — F(at)]” forn = 0 
(1.1) L(t, n) = trad s 
(") {1 — F(at))*™ [I [af(at,)] forn > 0 
kel 


\ 


where 0 S & S T and F(x) = | f(x) dx. Without loss of generality we take 
0 


T = 1. Obviously, for n = 0, L(t , n) is decreasing in a. For n > 0, we obtain 


re LD ; N - i 
(1.2) re = n[p/a — (X= *")z04) — é], 
a nm 


where ¢ = 1/n()>-i-1t) and where Z(a) = f(a)[1 — F(a)]” is the reciprocal 
of Mills’ ratio for the Pearson Type III distribution. The maximum likelihood 
estimate of a, say &, is thus the unique solution off = a ‘[p — ((N — n)/n)aZ(a)]. 
The uniqueness and existence of @ are established using the result, shown by the 
author in an unpublished paper [3], that aZ(a) is increasing with range (0, © ) for 
all p > 0. Thus, since both a’ and [p — ((N — n)/n)aZ(a)] are decreasing, the 
right-hand side of the last equation has a unique zero (except, of course, when 
n = N) for some ap > O and is decreasing with range (0, ©) for0 < a < a. 


2. Number of observations to the right of 7 not known. When only n is known 
we are led to consider the likelihood function 


(2.1) L*(t, | n) = (F(a) TT [af(aty)], 0<t%& <1 
b=] 
from which we obtain 


(2.2) 


* 
@ log D" = nlp/a — Wa) — al, 
a 


0 
where W(a) = f(a)[F(a)]"*. The maximum likelihood estimate of a, say a*, 
is thus a solution of = p/a — W(a). 


The pertinent properties of [p/a — W(a)] are readily established by consider- 
ing the random variable é with p.d_f. 


(2.3) g(tza) = af(at)[F(a))”, 


We find that E(¢) = p/a — W(a) and that Var (§) = —dE(t)/da > 0. Hence, 
E(é) is decreasing in a and the uniqueness of a* is assured. (We note that a* is 
obtained also by the method of moments). The range of E(~) may be established 
by determining its limits from the equivalent expression 


a a —1 
(2.4) E(t) = p/a — W(a) = lf x’e* as | l« f se az | , 
0 0 
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Obviously, lim... E(€) = 0. To determine its limit as a approaches zero, we ap- 
ply L’Hospital’s rule twice obtaining 


x . ae 
(2.5) lim E(é) = p+1 . 

This latter result, (2.5), introduces an interesting complication since 0 < ¢t < 1 
and P(t = p/(p + 1)) > 0. Thus ? = E(é) appears to have no solution for 
t = p/(p + 1). However, if we interpret L*(t | n) to be the likelihood function 
associated with the random variable ¢ introduced above and then complete the 
family of p.d.f.’s (2.3), by adjoining 


(2.6) g(t:0) = lim g(t:a) = pt", 0<t<1, 
a0 


the likelihood function, L*(t, | n), is seen to be maximized for t = p/(p + 1) by 
a = 0. 


3. Number of observations to the left of 7 known. In the event that N — n 
observations are known only to be less than 7 while the actual values, ¢, 2 7’, 


of the remaining n observations are given, we seek to maximize (again taking 
T = 1), 


[ [F(a)]” forn = 0 


(") [F(a)|*” I [af (at;)] forn > 0 


(3.1) Ste, n) =| 


with 4 = 1. For n = 0, S(t , n) is increasing in a. For n > 0 we obtain 


d log S/da = n| v/a + (¢ - ") W(a) — i| 


n 


which has a unique zero since, for all p > 0, W(a) is decreasing and 
lima W(a) = 0, [3]. 


4. Number of observations to the left of 7 not known. Knowing only the values 
of the n observations, t, 2 7, we are required to consider the likelihood function 


(4.1) S*(h | n) = [1 — Fla)” I laf(ats)], — 


from which we obtain 


d log S* 


(4.2) a 


= n{[p/a + Z(a) — i). 


That (4.2) has a unique zero is established in a manner analogous to that of 
Section 2. Consider a random variable with p.d_f. 


(4.3) h(t:a) = af(at)[1 — F(a)}", 
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Then E(n) = p/a + Z(a), and Var (ny) = —d@E(n)/da > 0. That is, E(n) is a 
decreasing function of a with range (1, ©), (lim... Z(a) = 1, [3]). Hence, 
since t = 1, (4.2) has a unique zero. 


5. Concerning information losses. The likelihood function, L*(t,|n), of Sec- 
tion 2 admits two interpretations. In one instance (urban population density) 
it is the likelihood function of the random variable £ with p.d.f., (2.4). In another 


instance (anomalous experimental design) it is a ‘‘conditional”’ likelihood func- 
tion of L(t , n): 


N 


n 


(5.1) ta a ( 


)a — F(a)|"[F(@)|"L*(% |n), OS £1. 


It is this latter instance which involves loss of information and with which we are 
concerned in this section. 

The information lost in not knowing N may conveniently be measured in 
terms of R. A. Fisher indices of information. A measure which suggests itself 
and which is adequate for our purposes is 


“ei Plog L* 2 log 4 


where the expectation is with respect to the p.d.f., L(t , n). The analog of (5.2), 
J(S), measures the information loss in the case of truncation on the left. Our 
only justification for employing the difference of the R. A. Fisher indices rather 
than, say, their ratio rests with the nature of the results obtained. 

Denoting —E(d* log L/da’) and — E(a* log L*/da’) by I(L) and I(L*) respec- 
tively and differentiating (1.2) and (2.2) with respect to a, we obtain 


I(L) = (p/a*)E(n) + Z'(a)E(N — n) = N[p/o’F(a) + Z'(a)(1 — F(a))I, 
and 
I(L*) = [p/a* + W'(a)|E(n) = N{p/a* + W'(a)|F(a). 
Hence 
J(L) = N[Z'(a)(1 — F(a)) — W'(a)F(a)] = NW(a)Z(a). 
Similarly 
I(S) = N{p/a‘(1 — F(a)) — W’(a)F(a)], 
I(S*) = N{p/a’ — Z'(a)\[1 — F(@)}, 


so that J(S) = NW(a)Z(a), and we see that our measure of information loss in 
not knowing the number of observations in the tail is independent of whether 
truncation is on the right or on the left. 

Finally, we recall that the information index associated with a random sample 
of size N from the Pearson Type III distribution is Np/a’. Our intuition suggests 
that we should have 7(L) + I(S*) = I(L*) + I(S) = Np/o’, which is easily 
seen to obtain. 
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ROTATION SAMPLING 


By AuBert Ross Eckuer! 
Princeton University 


1. Summary. This paper shows how to find minimum-variance estimates 
of the mean a(¢;) of a time-dependent population, assuming that one is restricted 
to the class of linear unbiased estimates. Each minimum-variance estimate is 
based on a specified sample pattern (a set of sample values drawn from the popu- 
lation at one or more distinct times). Let the random variable X;; denote the 
value of element j of the population at time ¢;. The correlation between Xx, 
and X % is assumed to be p'*~’!; the correlation between X;; and X is assumed 
to be zero; the variance of X,; is assumed to be o” independent of time. Iterative 
methods are developed; the estimate of the population mean a(t;_;) is used in 
determining the population mean a(t;). 

The paper discusses two important methods of sampling: in one-level rotation 
sampling, the statistician can add to the sample pattern only sample values that 
have been drawn from the population at the current time; in two-level (and 
higher-level) rotation sampling, the statistician can add earlier sample values 
as well as current ones to the pattern. Schematic sample patterns associated with 
these two methods are illustrated in (3.1) and (4.1). 

The optimum structure of a sample pattern is considered from two viewpoints: 
the variance of a pattern consisting of n sample values drawn at each time ¢; is 
minimized; the number of sample values drawn at time ¢; is minimized while the 
variance of the minimum-variance estimate is held constant. 

Finally, the estimation problem is generalized to include minimum-variance 
estimates of linear functions of two or more population means at different times. 

In order to maintain continuity, this paper presents published results along 
with new results; the latter are summarized below. The paper clarifies Patter- 
son’s fundamental method for finding minimum-variance linear unbiased esti- 
mates (Sections 2, 3) and extends his methods to two-level and three-level rota- 
tion sampling (Sections 4, 6, 8, 10, 13). The paper compares three methods of 
rotation sampling on a cost basis (Section 11) and shows how the one-level rota- 
tion sampling estimate of greatest practical interest can be derived from the two- 
level estimate (Sections 5, 14). Finally, the paper extends Cochran’s work in 
determining optimum patterns for the one-level rotation sampling estimate 


(Section 9). 


2. Introduction to rotation sampling. In survey sampling, the statistician 
sometimes must estimate at regular intervals of time a population parameter 
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which varies with time. If there exists a relationship between the value of an 
element in the population at time ¢ and the changed value of the same element 
at the succeeding time ¢ + At, then it is possible to use the information con- 
tained in earlier samples to improve the current estimate of the population 
parameter. In order to use the earlier sample information, one must carry out 
the sampling in such a way that the two samples drawn at successive times ¢ 
and ¢ + At have some elements in common. 

The name “rotation sampling” (suggested by Wilks) refers to the process of 
eliminating some of the old elements from the sample and adding new elements 
to the sample each time a new sample is drawn. This method of sampling is 
also called sampling on successive occasions with partial replacement of units 
(Patterson, Yates) and sampling for a time series (Hansen, Hurwitz and Madow). 
Double sampling can be regarded mathematically as rotation sampling involving 
a present sample and one overlapping earlier sample. 

We assume that we have a population w and a set of times (é,, 2, --- , tm). 
Each element of the population has a set of m values associated with it, one for 
each time ¢;. A sample pattern P consists of a set of sample values 2,; , where 7 
identifies the time ¢; when the element was sampled, and j identifies the popula- 
tion element. The sample pattern P can be visualized as an incomplete matrix 
with m rows and the number of columns equal to the number of distinct elements 
with values represented in P. More definite sample patterns are discussed later. 

We assume that the population 7 has an infinite number of elements (eliminat- 
ing any correlation between the sample values z;; and za). Let X; be the random 
variable representing the population values at time ¢;. We assume that E(X,) = 
a(t;) and that var(X;) = o° independent of the time. We specify the rest of the 
second moments of the joint distribution of (X,, X2, --- , Xm) by means of the 
exponential correlation assumption: the correlation p(X; , X;) is equal to p'**. 
This assumption implies that all partial correlation coefficients p;;., are zero if 
i<s<jort>s>j. 

We restrict ourselves to estimating the mean of the population 7 at a given 
time ¢; , or more generally to estimating a linear combination of the means at 
several different times (such as a = cya(t,) + Coa(te) + csa(ts)). The difference 
between two successive means is the linear function of greatest practical in- 
terest. We restrict ourselves to unbiased linear estimates L(a) = >> wii; of 
the population mean; the summation is taken over the values in P. Throughout 
the paper, we use the term unbiased in a stronger sense than usual. We require 
not only that E(L(a)) = a, but also that E()>; wja:;) = cye(ts) for all 7. 

Our goal is to determine the minimum-variance estimate of the population 
mean in the class of linear unbiased estimates based on the sample values in a 
specified sample pattern. We denote a minimum-variance estimate by the sym- 
bol M(a). 

Patterson [5] derives a necessary and sufficient condition for a linear unbiased 
estimate to be a minimum-variance estimate. In view of the importance of his 
result in deriving minimum-variance estimates, we state it as a theorem but 
omit the proof. 
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THEOREM 1. Assume that we have a sample pattern P of values drawn at times 
t, , t2, ++: , tm from a population x. Assume that the joint distribution of (X1, 
X2,-°++, Xm) has finite first and second moments. Let c, , C2, +++ , Cm be m con- 
stants (not all zero), and let L(a) be a linear unbiased estimate of 


a. = cia(ts) + Coa(te) ti:° + Cma(tm). 


L(a) is the minimum-variance estimate M(a) if and only if cov(a;;, L(a)) = kia 
for all combinations of i and j in P. The theorem does not have to be restricted 
to an infinite population size, a constant variance o’, or the exponential correla- 
tion assumption. 

Two corollaries of Theorem 1 are frequently useful. The proofs are quite simple 
so they are omitted. 

Coro.uary 1.1. Let £; be a linear unbiased estimate of a(t;) based on the sample 
pattern P, and let the assumptions of Theorem 1 hold. Then cov(%;, M(a)) = 
kia, fornlsism. 

Corouiary 1.2. Let M; be the minimum-variance estimate of a(t;) in the class 
of linear unbiased estimates based on the sample pattern P, and let the assumptions 
of Theorem 1 hold. Then var(M;) = cov(2x,;, M;), for 1 S i S m. Corollary 1.2 
is useful in calculating the variance of complicated minimum-variance estimates. 

Frequently many covariance-conditions must be checked in order to deter- 
mine whether or not an estimate L(a) is minimum-variance. In order to reduce 
this number, we derive a simplified form of the unbiased linear estimate that 
still contains the minimum-variance estimate. If the number of different ele- 
ments in the sample pattern is finite, it is evident that the pattern can be split 
up into a certain minimum number of subpatterns, each one of which is rec- 
tangular (a complete matrix of z;; values). 

THEOREM 2. Assume that the conditions of Theorem 1 hold. Assume that the 
sample pattern P has been broken up into a finite number of rectangular subpat- 
terns; let us consider subpattern P; which forms a complete matrix of x;; values with 
r rows and c columns. Then the c weights w;; associated with the values of x;; in 
any one of the r rows are all equal: wy, = We = -** = Wic, Wy = We = -°* = 
Wee, *** , Wa = We = = Wee. 

Proor. According to Theorem 1, the covariance-condition must hold for this 
subpattern. Consider the identity cov(7n , M(a)) — cov(az, M(a)) = 0. Ex- 
panding this identity, we obtain an expression of the form 


ay(wi — Wiz) + a2(wa — Wee) +--+ + 4-(Wa — Wr) = O. 


The coefficients a; depend only on the correlation model and the population 
size. In order that this expression be identically zero, wi; must be equal to wis, 
for 1 < i S r. The theorem is proved by iterating this argument c — 2 times. 
As a consequence of Theorem 2, we can express the minimum-variance estimate 
M(qa) as a linear combination of means of sample values; each mean value is 
formed from the sample values in a row of a rectangular subpattern. In the rest 
of this paper, we regard the mean value estimate as the canonical form of the 
linear unbiased estimate. 
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3. One-level rotation sampling. In this section we summarize for future refer- 
ence a basic result from Patterson [5]: the determination of Mj, the minimum- 
variance linear unbiased estimate of a(t;) based on a sample pattern of the type 


LLLLLLX t; 
LLLLLILX tt 


LLLIXNLL 


LIMLLILL 
eee Eero” 


eS. -o 


More precisely, the sample pattern is assumed to have n sample values in it at 
each time & , 1 S k S 7%. (1 — yw)n of the elements in the sample at time ¢_, 
are retained in the sample drawn at time ¢ , and the remaining un elements are 
replaced with the same number of new ones. The lines indicate how the sample 
pattern is built up; at time ¢, the kth row of sample values is added to the (k — 1) 
rows of earlier values. Since each enlargement of the pattern consists of a set. of 
sample values associated with a single time, we call this one-level rotation 
sampling on the above pattern. 

Patterson shows that the minimum-variance estimate M; of a(t;) based on 
pattern (3.1) can be written in the iterative form 


(3.2) Mj = Agéat (1 — Addie — (Bi + CdH-12 + C13 + BMixn 


where A;, B; and C; are unknown coefficients to be determined, and M;_, is 
the minimum-variance linear unbiased estimate of a(t;,) based on the pattern 
above the line in (3.1). The first subscript of a sample average denotes the time 
that the sample values were drawn, and the second subscript identifies the 
elements represented in the sample average (see bottom of (3.1)). The iteration 
reflects the way in which the sample pattern is built up row by row. 

Patterson shows that the iterative estimate is minimum-variance by means of 
the sufficiency of the covariance-condition (Theorem 1); the four possible co- 
variance-conditions (determined by Theorem 2) can be reduced to three inde- 
pendent equations which can be solved for A;, B,; and C; in terms of earlier 
coefficients: 

B; = p(1 — Aj), C; = 0, 


(3.3) & 
at: Ai 1-4 


"T= Ga — De — 1 — PO — Ava) 


Given A, (equal to »), any minimum-variance estimate M ; can be determined by 
a repeated application of equations (3.3). The variance of M; is determined with 
the aid of Corollary 1.2, 

2 


(3.4) var(M;) = cov(M;:, 1) = ~— ro 
5 
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It is easy to show that the sequence of A; converges. Var(M;_,) is greater 
than or equal to var(M{) because the sample pattern associated with M/_, 
is contained in the pattern associated with M; . Since the sequence of variances 
is bounded from below by zero, we conclude that the sequence of variances (and 
therefore the sequence of A,;) converges. The limiting value is 


— (1-6) + VG — 0 — PO = 4u(l — 2) 
35) in A; = A = ee *S 
Most of the above formulas simplify considerably when yu is equal to one-half, 


which will subsequently be shown to be the case of greatest practical interest. 
For example, 


(3.6) sw 0) + Vi — go 
3 


We tabulate A; for several values of p in Table 3.1. 


TABLE 3.1 
Values of the weights A; as a function of p for a replacement rate uw of one-half 
(one-level sampling) 





As As Ar 


| 
| 


| 4643 
| 4451 
.4022 
j |. .3824 

-5000 | .3075 | .3051 | .3042 


-5000 | . | 2044 . -2526 | .2456 | .2420 | .2401 | .2391 
-5000 | é | .2500 | . | .1667 | .1427 | .1250 | .1111 | .1000 




















o | 
a 
2 
3 
.4 
5 
6 
7 
75 
8 
-85 
9 
95 

1.00 





Qa? 
Variance of the minimum-variance estimate = var(M’) = a A 


If we examine Table 3.1, we find that the reduction in variance achieved by 
rotation sampling is quite small for most values of p and A; ; the variance of the 
estimate M; is not reduced by one-half until the correlation reaches the very 
high value of .95. It seems quite likely that rotation sampling will be of most 
value when (a) the correlation is high, and (b) it is so difficult to draw a sample 
that the sample size must be kept as small as possible. If it costs no more to 
carry out rotation sampling than independent random sampling, then even a 
modest reduction of five te ten per cent in variance wil] be worthwhile. 
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One-level rotation sampling has been investigated by several authors. The 
most important work was done by Patterson [5]; Yates [6] presents many results 
from Patterson’s paper without giving any derivations. Jessen [4] considered the 
special case in which » was equal to one-half and the sample values were re- 
stricted to times ¢, and é. 


4. Two-level rotation sampling. In the preceding section, we assumed that 
the sample pattern (3.1) was increased at time ¢, by adding sample values of the 
form 2z;;. Relaxing this assumption, we now permit sample values of the forms 
X%—1)3 and 2; to be added to the sample pattern at time ¢;, . Clearly, this is pos- 
sible only if we have records of the earlier values of the elements in the popula- 
tion. We call this two-level rotation sampling to emphasize that both present 
values and immediately preceding values can be added to the pattern; the gen- 
eralization to three-level or multi-level sampling is obvious. 

Since it is frequently cheaper in a sampling survey to obtain the sample values 
Xu»; and 2%; simultaneously instead of at two separate times, we assume a 
sample pattern of the type 


LLMLIX 
LILLIL 


TLIMIMLX | LIXIX 


eeEY ee 


2 1 


The lines indicate how the sample pattern is built up; at time & a new set of n 
elements is drawn from the population and the associated sample values for the 
times t, and t,_, are recorded. In rotation sampling, it is not sufficient to specify 
a sample pattern; the method by which the pattern is built up in time determines 
the most suitable iterative form of the minimum-variance estimate. 

We now show by means of the covariance-condition (Theorem 1) that the 


minimum-variance estimate of a(t;) based on the pattern (4.1) can be written 
in the iterative form 


(4.2) M7 = & — 442 + aiMin 


where a; is to be determined, M;1, is the minimum-variance linear unbiased 
estimate based on the sample pattern above the line, and the subscripts of the 
sample averages are defined as in the preceding section. 

Using Theorem 2, we conclude that the only possible covariance-condition is 


2 
cov (443 > M7) _ — (p _ a;), 


2 
2 ” “ ” oC . 
cov (#12, Mi) = a; cov (42, Min) = - a1 — a;_1p). 
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In other words, we have. the solution 


(4.3) 


(4.4) var (M{) = c (1 — aj). 


It can be shown by an argument similar to the one in the preceding section 
that the sequence of a; converges. The limiting value is 


(4.5) lima; =a= ee, 


1-900 p 


2 
(4.6) lim var (M ) = - V1 — p. 

We tabulate a; for several values of p in Table 4.1, and list the first five terms 
below. 


_ (4 — p’) 
4(2 — p*)’ 


_ _ 40(2 — p) 
16 — 12p? + p+. 


TABLE 4.1 
Values of the weights a; as a function of p 
(two-level sampling) 


1535) 

| 2083 
.2667| .2679| 
.3297| .3329 
.3989! 4068) . 
.4364| .4484| . 
| 4762! .4941) . 4996, | | 5000) . 
| .5187) .5452) . .5556| .5564! .5567| . 
| 5643} .6032) . 6232) .6254 6262) 6268) . 
.6133, .6703. . .7100| .7167| .7202) .7220|.7229|.7234, .7239) . 
6667, .7500) . .8333, .8573 .8750' .8889| .9000).9091| 1.0000 














SOpPHOINSUR WHS 
oe ou 


8 
: 


2 
. ° e Cc 
Variance of the minimum-variance estimate = var(M”) =—(l — ap) 


The two-level rotation sampling problem was first solved by Bershad [1]. 
Using straightforward minimization methods, he determined the unknown coeffi- 
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cients in the general (non-iterative) linear unbiased estimate of a(t;). The key to 
his solution is a method of evaluating certain types of large determinants by 
means of continued fractions. A similar two-level sampling problem is dis- 
cussed in [3]. 


5. Relationship between one-level and two-level rotation sampling. We 
preface this section with a lemma which is a generalization of the well-known 
method used to find the minimum-variance linear combination of two un- 
correlated estimates of the same parameter. The proof is simple and therefore 
is omitted. 

Lemma |. We assume that we have a finite sample pattern P which can be par- 
tittoned into two uncorrelated subpatterns P, and P, , the first consisting of sample 
values all drawn at time t, or earlier, and the second consisting of sample values all 
drawn at time t, or later (1 = k S 7). Let M, M,, and M, denote the minimum- 
variance unbiased linear estimates of a(t.) based on the patterns P, P, and P, 
respectively. Then M = (M, var (My) + M, var (M,))/(var (M,) + var (M,)), 
and var (M) = var (M,) var (M,)/(var (M,) + var (M,)). 

Using this lemma, we can easily show how the minimum-variance linear un- 
biased estimate M; based on the sample pattern (3.1) with » equal to one-half 
can be determined if we know the minimum-variance linear unbiased estimate 
M? based on the two-level sample pattern (4.1). Consider the following partition 
of the one-level pattern with u equal to one-half: 


LULLLLLXNX 


LULULULLLIX i 
LLIN | UXUXITX Se 


To the left of the vertical line, the pattern is a two-level sampling pattern with a 
sample size of n/2 instead of n (the sample values drawn at time é in the two- 
level pattern (4.1) have a weight of zero in the minimum-variance estimate and 
can be ignored). Therefore the minimum-variance estimate of a(t;) based on the 
sample values to the left of the line is M7. The minimum-variance estimate of 
a(t;) based on the n/2 sample values to the right of the line is the mean of these 
values. Applying Lemma 1, we find that 
20° 


Roi M7 + Lt P= var(M{) = 


| 1 — ajp 
2—aip 2—aip n 2-—aip 


M; ea 
We compare the second equation with equation (3.4) and conclude that 


. — i 
(5.1 ) A ; = te or 
— ap 


We have solved the one-level problem by means of the two-level problem by ex- 


pressing A; as a function of a; . The identity (5.1) can also be proved by indue- 
tion. 
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6. Three-level rotation sampling. We now assume that sample values of the 
form 2@~2);, X@—1»; and 2; can be added to the sample pattern at time ¢ . In 
analogy with the two-level pattern, we assume a sample pattern of the type 


LLLLX ty 
LLLXLX 
Lerre 
pera 
” grrxrrrxrrer 
LLLLLLLLLA | LLXLX 
LULL | LLLLX 


LIIINX 
SS 


3 2 1 


The lines indicate how the sampling pattern is built up by columns; at time ¢ 
a new set of n elements is drawn from the population and the last three sample 
values are recorded. We let M;’’ denote the minimum-variance estimate of a(t;) 
based on pattern (6.1). 

The problem of finding an iterative estimate which contains the minimum- 
variance estimate is twofold: (a) we must find an iterative estimate such that 
the number of unknown coefficients is equal to the number of independent co- 
variance-conditions, and (b) we must ascertain that the set of simultaneous 
equations has a non-trivial solution (the unknown coefficients must not all be 
zero). Condition (a) becomes non-trivial for the first time in three-level sampling. 
It can be shown that if we restrict ourselves to an iterative estimate involving 
sample averages and M;”;, we always have one too many independent covari- 
ance-conditions. 

Consider the estimate 
M;” = X11 — A211 + a; M3 — b&-21 — (4 + fo E22 

+ (bs + ¢)¥i-23 — d&is2 — CFis3 + (di + e)¥i-24 + fMi2. 


sre 


This estimate contains both M;”; and M;"2; a little consideration shows that 
condition (a) is now satisfied. In order to simplify the algebra in evaluating the 
unknown coefficients, we express equation (6.2) in an equivalent form by sub- 
stituting the term a;£;12 for a; M%_1. 

According to Theorem 2, there are six independent covariance-conditions 
which can be expressed in terms of the six unknown coefficients. If we can find a 
non-trivial solution to these equations, we conclude (Theorem 1) that we have 
found the minimum-variance estimate of a(t;) based on pattern (6.1). The co- 
variance-conditions are 


foots , Mi’) = f, cov(E45, Mis) 


(6.2) 


(6.3) 


are 


leov(4;44,Mi”") = f; cov(E4.4, Mis) +2 " (d, +e)p 





ROTATION SAMPLING 


srr sre 


2 
cov(E;_3,4 , M; ) = fi cov(Fj-34 ’ M j-2) + a (d; + e;) 


n 


sve 


cov(F;_3.3 ; M; 
| 


)= fi cov(€;-s,3, Mis) + = [—e; + (b; + c;)p] 


sre 


2 
cov(Z;32,M; ) = — [—d; — ( + fp + p ai] 
covlte. ’ M?’) - fi var(M%'s) + ~ [by +o- pe; 
2 
cov(F;-22,M;’) = — [— pd + pa; — (5 + fi] 
f 
cov(#;-21, Mj’) = a [—b; — pa; + p') 
\ 


leow(teie, AN”) = [pas — ole +f) +4 
(6.8) ) 2 
cov(#-141, M;') = = lp — a; — bypl 


If we compare equation (6.7) with (6.8), we find that by setting a; equal to p/2 
we can reduce these two equations to one independent equation. Furthermore, 
we conclude from equation (6.3) that d; is equal to —e;. If we make these sub- 
stitutions, we have left four equations in four unknowns which can be solved by 
straightforward algebra. The only term which needs special attention is the 
cov (%;:-3,3, Mis) in equation (6.5). Starting with the equation 


A ” pé be 2 
p cov (Zi-33, Mis — Z-23) = cov (Z-23, Mi-2 — %i-2,2) 


we conclude that 
- we 1 ue o 1 
cov(Z;-3,3, Mi-2) = — var(Mi_2) + —|p —- -) = 
p n p, 


The solution to the four equations is 


= Pl(3 + e) — be-all — DI 
2[(9 — p*) — 2bi-2(3 + 0’)] 


fi _ p. + 2b; 
‘ 3 rar 2b;-2’ 


a b(1 + p) — J; 
1-7 : 


—(bi + cx)p. 


b; 
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As indicated by this solution, we regard b; as the fundamental variable; it is the 
only unknown coefficient to appear in the variance of M;’’ 


(6.10) var(M;’’) = Cae ait b;p |. 
n a 


The values b; , bz and bs; must be determined by an independent method. It is 
quite easy to evaluate these coefficients by a straightforward minimization of 
var (M;{’’) through var (M;’’). The first five values are 


bh, = 0, 
b. = 0, — 20 =)’ 


° 2 as 4 
sage by = 2 O42 +0) 
. = 2(9 + p*)(3 — p*) 


be p(3 + p) 


The 6b; converge to the limiting value 


lim b; = 


t—00 


» - 8-0) — V0 — FO — &*) 
4 


and the limiting variance is 


(6.11) var(M’”) = * a i 2 ne V(L = #)(9 — *)|. 
n 4 t 
Unless p is close to unity, var (M’’’) is not much less than o°/n. 

The method used in solving the three-level rotation sampling problem can be 
extended without any conceptual difficulty to higher-level sampling. However, 
the smaller variance obtained by multi-level procedures over single-level proce- 
dures (discussed in a later section) is probably not worth the very great increase 
in algebraic complexity. For example, the four-level rotation sampling problem 
requires the solution of at least eight simultaneous linear equations with alge- 
braic coefficients. 


7. Truncated patterns. The one-level and multi-level rotation sampling es- 
timates discussed in the previous sections were based on sample patterns that 
extended over any number of distinct time-levels. However, there are several 
practical reasons for truncating these patterns—that is, ignoring all sample 
values except those associated with the N most recent times ¢; , fi1, --+ , liw4i- 

Most of the reduction in variance accomplished by rotation sampling is at- 
tributable to the sample values that were drawn most recently. For example, 
suppose we want the variance of the minimum-variance estimate based on a 
truncated pattern to be no more than ten per cent larger than the variance of the 
corresponding minimum-variance estimate based on an infinitely long pattern. 
If we are carrying out one-level sampling, and if p is equal to .5, .7, .9 or .95, we 
should use a pattern with sample values restricted to 2, 3, 4 or 5 time-levels, 
respectively. These and other variance comparisons can be easily obtained from 
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Tables 3.1 and 4.1. It seems pointless to continue using older sample values 
which make virtually no contribution to the estimate. 

A more important reason for using truncated patterns is the possibility that the 
exponential correlation model may describe the behavior of the parent population 
only locally; it may be adequate for sample values associated with nearby times, 
but not for sample values farther apart in time. For example, in economic popu- 
lations with month-to-month correlation p, the year-to-year correlation often 
is much larger than p”. In other words, an underlying cyclic behavior of the 
population may upset the exponential correlation model unless the pattern 
length is a small part of the period. 

Finally, a truncated pattern is easier to handle computationally than an ever- 
increasing pattern. After deciding how far back to truncate, we compute a set 
of coefficients to be multiplied into the sample averages. At each time ft; , we 
use the same set of coefficients but apply them to a different set of sample aver- 
ages. 


8. Generalization of the sample pattern. In this section and the next two sec- 
tions, we consider what happens when we allow more freedom in the choice of a 
sample pattern. We first describe the modifications necessary when the number 
of sample values added to the pattern varies with time. 

To be specific, we assume a sample pattern of the type shown in (3.1), but with 
n. sample values associated with time t,, 1 S k S 7. These nm values can be 
divided into two classes: n; are associated with popualtion elements represented 
in the sample pattern at time ¢,.,, and the remaining n; are associated with 
elements entering the pattern for the first time. We assume one-level rotation 
sampling; the pattern is built up row by row. 

The minimum-variance estimate M; based on this sample pattern can be found 
by the same methods as before. We quote Patterson’s results; equation (3.3) 
generalizes to 


, ” 
Ny Nia 
(8.1) 1— Aj = > oS ae 
NiNia — p Ni (Niy — Aj_sni) 


and the variance of the estimate is 


2 
2 
o 
” 


45. ifn; ~ 0, ni. + 0 


nN; 


2 2,4? 
“|! =f. ese ifn? = 0, nia ¥ 0. 


nN; Ni-1 


var(M;) = 


Analogously, we assume a sample pattern of the type shown in (4.1). At 
time & we add to this pattern by drawing a new set of m elements from the 
population and recording the associated sample values for times & and &_; (two- 
level rotation sampling). 

The minimum-variance estimate M; based on this sample pattern can be 
found by the same methods as before. We omit details and give the results. 
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Equation (4.3) generalizes to 


ae 
(8.3) weit nia + nl — aj_rp) 


and the variance of the estimate is 


(8.4) var(M;) = = (1 — ap). 


9. Optimum choice of u in one-level rotation sampling. We have shown in 
Section 3 how to find M; for any preassigned value of 4; we now determine that 
value of » which corresponds to the minimum value of the function var (M;(u)). 
The solution to this problem depends only on the point at which the pattern is 
truncated. 

If we wish to find the optimum uz for infinitely long patterns, the problem is 
easy to solve: we differentiate the variable A/u of equation (3.5) with respect to 
u, set the result equal to zero, and solve for yu. If p is less than unity, a little caleu- 
latien leads to the result 1 = 3; if p is equal to unity, it is intuitively clear that the 
optimum value of uz is unity. 

If we adopt the practical viewpoint of Section 7 and decide to use truncated 
patterns, the problem of finding a minimum-variance yu is conceptually simple but 
computationally tedious. If we have a pattern consisting of sample values asso- 
ciated with the time-levels ¢; and ¢;_, only, we differentiate A2/y with respect to 
u, set the result equal to zero, and solve a quadratic equation in un. The optimum 
value of » is 


(9.1) b= Avia, 


Similarly, we can determine the optimum yz for patterns consisting of sample 
values associated with three or four time-levels. The algebra is laborious; for 
example, when the pattern is four time-levels long, one must solve a sixth-degree 
equation in » with coefficients that are fourth-degree polynomials in p*. Omitting 
these equations, we tabulate optimum values of u for selected values of p: 


Pattern 
Leagth [| *. 





-612 .670 
.563 623 : 


The corresponding variances are given in Table 9.1 at the end of this section. 
We now consider the problem of choosing an optimum set of u for the one- 
level rotation sampling pattern when u is not restricted to a constant value in 
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time, as it is in the sampling pattern (3.1). This problem has two parts: one 
must determine the set (1, we, us, *** , i) Which corresponds to the minimum- 
variance estimate M7 , and one must calculate the variance of this estimate. 
This problem was first solved by Cochran [2]; we summarize his results for 
the purpose of comparison. The solution is simplified by the fact that the op- 
timum set (1, uz, us, -** , wx) contains the optimum set (1, us, us, °° , ue-1), 
2 = k S 7; therefore the solution can be obtained in an iterative form. If we 
define G; by the equation var (M7) = (0’/n)G; , then the variance can be calcu- 
lated from the iterative equation 
Y atv 


mi +> 


G PG aE 


and the optimum values of u,., 1 < k S i, are given by 


| <4 PO mye VTS P 
(9.3) wm =1+ Gua ‘ 


The limiting values of G; and py, are 
9 ie otha =. 
(9.4) hen G, = Gn MY =F — = 69). 
to er 


lim px w=. 


ko 


We tabulate uz, us and wy, for selected values of p; the corresponding variances 
are presented in Table 9.1 at the end of this section. When p is equal to zero, it 
does not matter what replacement rate is used. 


0 4 -6 8 


.522 556 
502 -506 .531 ‘ 637 
-500 501 -508 533 


Since the successive values of yu; are different, the optimum patterns cannot be 
conveniently truncated. The truncated patterns gradually change in form from 
the optimum pattern with uy, given by equation (9.3) to the limiting pattern in 
which all yu, are equal to one-half. 

Table 9.1 gives the variances of the estimates that have been discussed in this 
section. For comparison, we include the variance of the minimum-variance 
estimate based on the sample pattern (3.1) with » equal to one-half. The most 
striking characteristic of this table is the small difference in variance between the 
two kinds of optimum estimates and the one-half replacement rate estimate. In 
practical applications of rotation sampling, one might as well use the latter pat- 
tern, since it is much simpler to apply and can be conveniently truncated. 
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TABLE 9.1 
Comparison of the variances of three minimum-variance estimates based on different 
restrictions on the replacement rates yu; 
(one-level sampling) 


Length of 


pattern Restriction on p-values p=0 3 4 


2 No restriction 1. .990  .958 


000 
| All yw; are equal 1.000 | .9% .958 
| All ws equal 1/2 000 


No restriction -000 
All wi are equal .000 
All yw: equal 1/2 000 


No restriction ‘ 
All yw; are equal ’ | .990 
All ws; equal 1/2 Re .990  .956 


All three methods are | 1. .990 .956  .S889 
equivalent 


All entries should be multiplied by o?/n to obtain variances 


10. Minimizing the sample size while holding the variance constant in time. 
In this section we consider the problem of minimizing the sample size of the 
pattern at each time ¢; while holding constant in time the variance of M; , the 
minimum-variance linear unbiased estimate of a(t;) based on the sample pattern. 
The solution can be obtained for one-level or two-level rotation sampling. 

We consider the two-level sampling problem first because its solution is simpler. 
We assume a sample pattern of the type illustrated by (4.1); at time ¢, we add to 
the pattern by drawing a new set of n, elements from the population and re- 
cording the associated sample values for times & and 4&1, 1 S k S 7. We as- 
sume that var (M7) is equal to o’/N for all values of 7. Obviously, n, is equal to 
N. If var (M2) is to be equal to o’/N, then 


l ' | 
=, — a2p) ae 


We substitute equation (8.3) for a; and solve for nz . 


nm = NV 1 — fp’, estiMwisc?e vine, 


We can similarly evaluate nz , n,, etc.; we find by induction that all succeeding 
n; and a; are equal to nz and az, respectively. In other words, we should draw 
a sample of N elements at time t;, and a sample of N+/1 — p? elements at all 
succeeding times. It is rather surprising to find that the minimum value of 7; 
is attained by time ¢. ; when we considered in Section 4 the inverse problem of 
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minimizing the variance while keeping the sample size constant, we found that 


the variance approached the limiting value (o’/n)+/1 — p? asi approached in- 
finity. 

Patterson [5] has solved the problem of minimizing the sample size while hold- 
ing the variance constant using a one-level rotation sampling pattern; we sum- 
marize his results for the sake of comparison. The problem was first considered 
by Jessen [4]; he gave an incorrect solution for a pattern consisting of sample 
values from times ¢; and & only. 

The solution to the problem is carried out by induction. Using the terminology 
of Section 8, we seek to minimize the sample size n, = ni + n:. By means of 
equation (8.1) and the restriction that Az_:/nz_1 be equal to 1/N (the induction 
hypothesis), we can express n, as a function of n; alone and carry out the minimi- 
zation with respect to this variable. We find that 


ot ms w| =@ —e)+ vine] 
p* 
and that n; is equal to n; for k greater than or equal to two. In other words, from 
time ¢, onward one should draw a sample of 2n;, elements and use a replacement 
rate of one-half. 
For both one-level and two-level rotation sampling, the behavior of the mini- 
mum-sample-size problem can be summarized by the relation . 


minimum variance at time ¢; 


using the constant sample size method 
minimum variance at time ¢,, 


_ minimum sample size at time 4 





— : : using the constant variance method. 
minimum sample size at time ft, 

If one wishes to carry out rotation sampling on a truncated pattern, then one 
is restricted to minimum-variance estimates, since patterns with constant sample 
sizes and constant replacement rates are the only ones that can be conveniently 
truncated. 


11. One-level versus multi-level rotation sampling. In this section we derive 
a criterion for deciding when to use multi-level sampling instead of one-level 
sampling. The criterion is given in the form of a graph in which the correlation 
p is plotted against a parameter k which compares the cost of sampling several 
different values of an element at one time with the cost of sampling only one 
value at a time. 

We assume that it costs c to obtain a single sample value at time ¢;, and 
c(1 + k) to obtain both sample values 2;; and x(;_1); at time ¢t; , whereO < k < 1. 
In other words, we allow for the fact that it may cost less to obtain two sample 
values at a single time than it does to obtain them at two separate times. The 
three-level sampling cost is assumed to be c(1 + 2k) per element. 

Suppose that we have a fixed amount T to spend on our sample at time ¢,; ; how 
many sample values can we draw, using each of the three methods of sampling? If 
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we restrict ourselves to patterns of constant sample size n, then for one-level 
sampling the sample size n’ is equal to 7'/c, for two-level sampling n” is equal to 
T/c(1 + k), and for three-level sampling n’” is equal to 7'/c(1 + 2k). 

We begin by assuming that we have patterns of infinite length. Equating the 
variance of the one-level estimate for u equal to one-half with the variance of 
the two-level estimate, we have 


2° -A - e+ VI-F] 
n' p ’ 
Substituting in the values of n’ and n” and solving for k, we find the curve on 


which var (M’) is equal to var (M”): 


- [viad —37. 


p 


In order to decide when to use a three-level estimate instead of a two-level one, 
we equate the variance of the two-level estimate with the variance of the three- 
level estimate, and solve for k. Both of these curves are plotted in Graph 11.1; 
the one-level versus two-level curve is labeled (1) and the two-level versus three- 
level curve is labeled (2). The two curves partition the (k, p) unit square into 
three areas: in the large area at the upper left, var (M’) is less than var (M”) 
or var (M’’’); in the central area var (M”) is the minimum; in the small area 
area at the lower right var (M’”) is the minimum. 

If we have very short truncated patterns, the preceding analysis no longer 
holds. In order to see what happens, we solve the equations var (M;) = var (M7) 
and var (M?) = var (M;’’) for i equal to two and three. In comparing the 
variances of M; and M2, we have two possibilities to consider: we can use the 
simple one-level pattern in which yu is equal to one-half, or we can use the opti- 
mum one-level pattern in which » is equal to (1 — 1/1 — p*)/p’. Wesummarize 
all of these results in the table below and in Graph 11.1. 


Variance of § 2 4 6 8 9 
(1) M’ = M” kb 010.044 «111.250.3983 
(2) M” = M” 000 .001 =.008 = .036 =. 085 
(3) M: = M; ,p» = , 010 .042 .099 .191 .254 291 
(4) Mt = My,» 010.042 ©.098 «6.177.207 . 196 


mr 


(5) M: = M; 000 §=©.000 §=6.000 S000) 000.000 





(6) M; = M3 ,u =} 5 010.044 «110.285 '—‘«iw340—S—=«CsD 
(7) M; = M;’ ' 000 .001 .005) «6.012 .014”ss—#OI 


Graph 11.1 clearly shows that the higher-level sampling patterns are optimum 
over a very restricted area; it is advantageous to undertake four-level or higher- 
level rotation sampling only for very low k, very high p, and relatively long pat- 
terns. 





ROTATION SAMPLING 


GraPpH 11.1 
Comparison of one-level rotation sampling with multi-level rotation sampling 


12. Estimation of a linear function of the population means. We have dis- 
cussed at some length the problem of finding minimum-variance linear unbiased 
estimates M; of the population mean a(t;) based on several different sample pat- 
terns. In the next two sections, we consider the more general problem of finding 
M (qa), the minimum-variance linear unbiased estimate of a linear function of the 
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population means at several different times: 


a = cya(t;) + cya(t;) + --- + cra(t:), 


where the letter c denotes a constant. 

The problem of finding minimum-variance estimates M(a) contains the prob- 
lem of improving the minimum-variance estimate M; of a(t;) by using observa- 
tions drawn at times ¢;,; or later. (Previously, we assumed that these later ob- 
servations were not available.) The connection between these problems is pre- 
sented in Theorem 3 below, which can easily be proved using Theorem 1 in both 
directions. Following Patterson’s notation, we denote the minimum-variance 
linear unbiased estimate of a(t;) based on a sample pattern containing values 
drawn at times é,, --- ,¢;, --- ,¢; by the symbol ;M;. When 7 is equal to 7, 
we write M; as before. 

THEOREM 3. Assume the conditions of Theorem 1. Assume also that we know the 
estimaies »M;,% = 1, 2, --- , m, based on the sample pattern P. Then the mini- 
mum-variance estimate M(a) of a = cya(t) + coa(te) + +--+ + Cma(tm) in the 
class of linear unbiased estimates based on the sample pattern P is 


M (a) = ¢1(mM1) + C2(mM2) + +++ + Cm(M nn). 


Using this theorem and the results derived in the next section, it is a straight- 
forward but tedious job to write down minimum-variance estimates of any linear 
function of the population means; therefore we omit specific details or examples. 
It is equally simple to calculate the variances of these estimates. The linear func- 
tion of greatest practical interest and most frequently discussed in the literature 
is the difference between two successive means a(t;) — a(t;1). Other functions of 
potential interest are higher-order differences and moving averages. 


13. Improvement of minimum-variance estimates of the mean. Patterson 
[5] has solved the problem of finding improved minimum-variance linear un- 
biased estimates ;M; of a(t;) based on one-level rotation sampling of patterns of 
the type illustrated by (3.1). He shows how to solve for the estimates iteratively. 
For example: 


(13.1) ‘Mia = Mia — pAiaM: + pAirdina, 
(13.2) ss = Mi. —_ pA,2(1 _— Ais)M; = pA;_2(1 —_ Aj-1)%;,1. 


Patterson also derives variances and covariances based on the improved es- 
timates. However, when j becomes much larger than 7 in the estimate ;M, , these 
expressions become quite cumbersome; consequently Patterson derives non- 
minimum-variance linear unbiased estimates of a(t;) that have a somewhat 
simpler form. 

The problem of finding ;M7 using two-level rotation sampling can be solved 
rather completely; it is not necessary to resort to non-minimum-variance esti- 
mates in order to obtain manageable expressions for the estimates and the 





ROTATION SAMPLING 683 


variances and covariances between them. In fact, it is possible to obtain vari- 
ances and covariances of improved estimates based on infinitely long patterns. 
We assume that we know M te M : --> ,M : and wish to obtain ;M?_, when 
k is greater than or equal to one. We can partition the sample pattern illustrated 
in (4.1) into two statistically independent parts, the left-hand one a two-level 
sampling pattern running from time ¢ through time ¢,_; , and the right-hand one 
a two-level sampling pattern running from time /,_; through time t;. Each part 
can be used to obtain a minimum-variance linear unbiased estimate of a(t;x); 
the estimate based on the right-hand pattern is Mz (with inverted time), and 
the estimate based on the left-hand pattern is M7_, . Applying Lemma 1 of Sec- 
tion 5, we conclude that the minimum-variance linear unbiased estimate of 
a(t;_.) based on the entire pattern is 


Mi. ai rote Mr: + 
) 


b ” 
— Mi-x 
a + i—k 


at b 
where a is equal to var (M7_,) and b is equal to var (M;). Furthermore, the 
variance of this estimate is 
2 
” o 1—aip)\l-—a 
var(;Mij_.) = - es + 80) 
n (1 — ayurp) + (1 — ap) 


The first three terms can be written down in an equivalent form: 


| 
var(;Mj1) = np will — ai-1p), 


var(,if 7.) « — ——_.ga.di — axeo), 
FS 
. @¢ 4—3p 
var(,Mj_;) = = . ° Q; Q;_1 4;-2(1 — a;_3p). 
l p' 


If we let 7 equal 2k, then the variance becomes 


var(.M;) = < . apn P= . var(M; ) 
which could have been predicted at once from (4.1). The variance of this estimate 
is less than that of any other estimate of a single mean based on a pattern of the 
same length. 
The covariance terms are equally easy to derive. Using Corollary 1.1 and the 
iterative form of the two-level estimate, we obtain 


2 


(13.3) cov(;M? ky M?) = = a@;@ji-1°°*° Qj—e1(1 — Gyr p). 
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The following covariance-equation contains all the preceding results: 


” ws o (1 —- a;p)(1 —¢ _k p) 
13.4 r iM i—k id oo = —_—— ; - : ee Me eee ° 
( 3 ) cov( ky 1 i) _ Cl a; ra ( rete Q;y—; Ay—j-1 Qj—k+1 


We require that 0 < 7 < k < 7; the second inequality is not a restriction. The 
derivation of equation (13.4) is straightforward: 


cov(;Mi_x, .Mi_;) = cov(M?.., .M?_,) 


l—a ” 
= - iP cov(M ;_x, Mi. j)- 
(1 — aj;p) + (1 — aijp) 
If we substitute equation (13.3) into the above expression, we obtain (13.4) at 
once. When k equals j, we assume that a;_; --- a:44: is equal to one. 
It is easy to derive limiting results from equation (13.4). For example: 


o (1 — ap)a 


lim cov(s;M7% , M7.) = 


ic nt 2 


14. A further relationship between one-level and two-level rotation sampling. 
If we restrict ourselves to a replacement rate u of one-half while using one-level 
rotation sampling, then it is not necessary to use the cumbersome formulas de- 
veloped by Patterson in [5]; we can write down the estimates ;M; and the asso- 
ciated variances and covariances in terms of the two-level estimates ;M7. This 
section extends the results of Section 5, which showed how to write M; as a 
function of M7. The simplified results derived below can be quite useful in prac- 
tice, since a replacement rate of one-half was shown to be nearly optimum in 
Section 9. 

Consider the minimum-variance estimate 41M, which is based on the sample 
pattern (4.1) extending from time ¢ to time ¢;,; . The estimate 11M? is also a 
minimum-variance linear unbiased estimate based on the sample values ex- 
tending from time ¢, to time /; ; the sample values associated with time f and 
time ¢;,; are assigned zero weights in the minimum-variance estimate and can be 
disregarded. In other words, ;,,/7 is the minimum-variance estimate based on 
pattern (3.1) with u equal to one-half and a sample size of 2n instead of n; 
:41M? is identically equal to M; . Using a similar argument, one can easily show 
that 41/7. (based on a sample size of n) is identically equal to Mj_, (with 
equal to 4 and a sample size of 2n). The variance-covariance formulas are sum- 
marized by the general expression 


’ , J " ” 
cov(,M;- rr iM;_-x) = 2cov (:441M;. Js iwiMiu ° 
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MULTI-LEVEL CONTINUOUS SAMPLING PLANS! 


By GeEerAutp J. LIEBERMAN AND HERBERT SOLOMON 


Stanford University and Teachers College, Columbia University 


1. Summary and introduction. In 1943 Dodge [1] published a sampling 
inspection plan for a continuous production line. He assumed the production 
process to be in statistical control and also assumed the items were classified, 
after measurement, as “defective” or “non-defective”. Dodge derived the 
Average Outgoing Quality (AOQ) function for his plan, obtained the Average 
Outgoing Quality Limit (AOQL), and provided a graphical procedure for 
choosing the parameters of the plan which guarantee a specified AOQL. Wald 
and Wolfowitz [2], in 1945, discussed a sampling inspection plan for continuous 
production which insures a prescribed limit on the outgoing quality even when 
production is not in statistical control. However, they demonstrate an aware- 
ness of the penalty involved in accomplishing this end and discuss other desirable 
features an optimal plan should enjoy, namely, a minimum amount of inspection 
to reduce inspection costs, and protection to insure what they term “local 
stability”, i.e., the ability to detect quickly “too many long sequences” of poor 
quality. Dodge in his paper also discusses minimum inspection and an idea 
similar to “local stability”? which he calls “‘protection against spotty quality”’. 

An inconvenient feature of both plans is the abrupt change between partial 
inspection and 100% inspection. This can lead to hardships in personnel assign- 
ments in the administration of an inspection program. For example, in an item 
such as aircraft engines, a smoother transition to 100% inspection is needed. 
Both plans also tend to produce a form of tightened inspection when the process 
average may not warrant it. In a later paper [3] Dodge considers two modifica- 
tions of his plan which delay the beginning of 100% inspection and also add 
some insurance for local stability. He derived the AOQ function for each of the 
two plans. 

The primary purpose of this paper is to consider an extension of Dodge’s 
first plan which (a) allows for smoother transition between sampling inspection 
and 100% inspection, (b) requires 100% inspection only when the quality 
submitted is quite inferior, and (c) allows for a minimum amount of inspection 
when quality is definitely good. This aim is accomplished by the introduction of 
a multi-level sampling plan which specifically allows for any number of sampling 
levels subject to the provision that transitions can only occur between adjacent 
levels. This inspection plan will be recognized as a random walk model with 
reflecting barriers. The first Dodge plan is easily recognized as a special case 
containing only one sampling level. 


Received September 21, 1954, revised May 16, 1955. 
1 Done under Office of Naval Research sponsorship, Contract N6onr-25126; reproduc- 
tion in whole or in part permitted for any U. S. Government purpose. 
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The AOQ function for the plan is derived and contours of constant AOQL 
are developed for a two-level and an infinite-level plan. These are added to the 
contours of constant AOQL for Dodge’s single-level plan to present a picture in 
Figs. 1, 2 and 3 reflecting the relationship between a fixed AOQL contour and 
the number of sampling levels used in the plan. In addition an approximation 
procedure is presented for determining contours of constant AOQL when the 
number of sampling levels lies between three and infinity. 

For a desired AOQL and a given process average, criteria for selecting a specific 
multi-level plan are discussed. 


2. The Multi-Level Inspection Plan (MLP). The plan proposed in this paper 
is as follows: 

0) At the outset inspect 100 percent of the units consecutively as produced 
and continue such inspection until 7 units in succession are found clear of defects. 

1) When 7 units in succession are found clear of defects, discontinue 100 per 
cent inspection and inspect only a fraction f of the units (i.e., one out of every 
1/f where 1/f is an integer). If the next 7 inspected units are non-defective, 
proceed to the next level; if a defective occurs, revert immediately to 100 per 
cent inspection. 

2) When at rate f, < inspected units are found clear of defects, discontinue 
sampling at rate f and proceed to sampling at rate f’. If the next 7 inspected 
units are non-defective, proceed to the next level; if a defective occurs, revert 
immediately to sampling at rate f. 

3) When at rate, f°, i inspected units are found clear of defects, discontinue 
sampling at rate f* and proceed to sampling at rate f°. If the next 7 inspected 
units are non-defective, proceed to the next level; if a defective occurs, revert 
immediately to sampling at rate f’. 


k — 1) When at rate f*”, 7 inspected units are found clear of defects, discon- 
tinue sampling at rate f* and proceed to sampling at rate f*". If the next 7 
inspected units are non-defective, proceed to the next level; if a defective oc- 
curs, revert immediately to sampling at rate f*”. 

k) When at rate of f*”, ¢ inspected units are found clear of defects, discon- 
tinue sampling at rate f* and proceed to sampling at rate f*. If a defective 
occurs, revert immediately to sampling at rate f*, otherwise, continue sampling 
at rate f*. 

Whenever sampling is in operation, one item should be selected at random 
from each segment of 1/f7(j = 0, 1, 2, --- , k) production items. During both 
sampling inspection and 100 per cent inspection all defective items found should 
either be corrected or replaced with good items. 

This plan will be called the Multi-Level Continuous Inspection Plan (MLP). 
For k = 1, it reduces to the first Dodge Plan. The MLP plan is one of a general 
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class of multi-level plans which have the property that provision is made for 
smaller sampling rates when quality is good. It was specifically chosen because 
it made mathematical and computational analysis tractable and still main- 
tained all the fundamental ideas of multi-level sampling plans. In fact, results 
for more general type multi-level plans are given in Sections 3 and 4. 


3. The AOQ function for MLP. Suppose the inspection plan is generalized so 
that at the j* sampling level (j = 0,1, 2, --- , k) there is a sampling rate f; and 
i; non-defectives must occur to proceed to rate fj; ; fo = 1 (100 per cent in- 
spection), 7 is infinite. While on 100 per cent inspection 7 successive units 
must be non-defective before proceeding to the first sampling level. In MLP, 
f; =f? ;and i; = % = i,j ¥ k. The AOQ function for this more general inspec- 
tion plan can be derived without any more complexity than the AOQ for MLP 
and this will now be done. It will be assumed, of course, that the production 
process is in control, i.e., qualities of the items are mutually independent bino- 
mial random variables with constant parameter p. 

Let a “unit” be a group of fj'(j = 0, 1, --- , k) successive production items 
from which one is to be chosen at random for inspection. After the inspection 
of any item the size of the unit from which the next item is to be chosen for 
inspection is determined by past history according to the given rule. Suppose 
we represent the result of the m inspection by the random variable x,, where 
Xm is zero if the inspected item is non-defective and is one if it is defective. 
Then a sequence (2, %2,-°*:, %m,°** ) represents results on successive in- 
spection trials and can be considered a point in sample space. A particular 
sampling plan attaches an integer from the set fo’, fi’, fe’, --- , fi to each co- 
ordinate (x,,) of the sample point. Which integer gets attached to a particular 
Xx» depends on x, X2, ++: , Xm—-1- The integer attached to z,, is the number of 
production items in the unit from which a member is inspected with result z,, . 

If fj, is the integer attached to x,, in the sequence (x;, %2,-°-*, 2m,°** ) 
then the reciprocal of the average fraction inspected for that sequence is 


(1) lim + > fp 
l 


nw Thm 


provided the limit exists. Equation (1) can be written as 


(n) 


(2) Jim > $i'9; 


G 
n—2 j=0 7 


(n) - ‘os ° . as 
where g;" is the number of times that sampling from a sequence of individual 
° ° . 1 . > . . 
production items of length f; occurs in the first trials. Now, define the re- 
ciprocal of the average fraction inspected, F’, as 


k 
(3) Fo = DU f;'P; 


j=0 
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where P; is the probability that, for a “randomly chosen” m, z,, is the result of 
inspection of an item selected from a unit of size fj’. 

If it can be shown that lim,..g5”/n = P; almost everywhere equations (1) 
and (3) become identical. Thus, the average fraction inspected can be represented 
in terms of the steady state probabilities. In order to prove these results, it is 
useful to relate this: process to a Markov chain. 

Consider a sequence of trials. At anytime m, (after the m™ observation) the 
system is in state 


E,(v = 0,1,2,-+:, t,t +1,°°-,ttt,mtatil,---, % 
+uthe,my+titiet l,-s-,tetate:: taint 1) 


where 


7 


E = state where the m“ trial resulted in beginning 100 per cent 
inspection. It signifies a defective item observed while sam- 
pling at rate f; or during 100 per cent inspection. 

Oe = state where the (m — 1)* trial resulted in beginning 100 
per cent inspection and where the m* trial resulted in a 
non-defective. 

Ey = state where the (m — 2)"¢ trial resulted in beginning 100 

per cent inspection and where the (m — 1)" and m* trial 

resulted in non-defectives. 


E,. = state where the (m — i) trial resulted in beginning 100 
per cent inspection and where the next 7%» trials resulted in 
non-defectives. This means that sampling at rate f, is to 
begin. 

Ei4+1 = state where the (m — 1) trial resulted in allowing sampling 
at rate f; to begin and where the m* trial resulted in a non- 
defective. 

Ei, +2 = state where the (m — 2)"¢ trial resulted in allowing sampling 
at rate f; to begin and where the next two trials resulted in 
non-defectives. 


7 


Eins iy4s-+4ig_p pi = State where the (m — 1) trial resulted in allowing sampling 
at rate f, to begin or sampling at rate f, is in operation, 
and where the m* trial resulted in a non-defective. 
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The matrix of transition probabilities is given by 
. forr = 0, 1,---,t%+% 


forr=mtu,wtattl,-::,w~tate 
=Smtuate, -,mtutets 


Dr iotipt+++++ip_y a for , = to + dy + hs + ti 1 
Pre=Q=1-p forr =s-—l, = 12,---,dtaters +h 


Piotirte+++ie_rtliotiit-+-+ieitl = Q 


all othér p,,, = 0. 


A result? of a theorem of Chung [5] indicates that this type of Markov chain 
has the property that 


. *(n) : ° : * 
lima.» 9» /n exists and is equal to a unique P; 


*(n) - . . = . 
almost everywhere, where g, ”’ is the number of items in state EZ, in the first n 
° *: ‘ oye ; 
trials and P; is the “steady state’ probability that for a ‘‘randomly chosen” 
a ° ( . . *(n) . . e 
mM, Xm is in state E, . Since g;” is a finite sum of g?‘” and P; is a finite sum of 


* . k 1 . . (n) ¢ 
P¥ , we get lim,.. > j;-0f; g;"/n exists almost everywhere and lim,..95"/n = 


P; almost everywhere. Furthermore, since the process is an irreducible, aperi- 
odic, finite Markov chain, the limiting stationary probabilities are independent 
of the initial probabilities. This implies that the AOQ does not depend on which 
sampling rate is used initially, e.g., the process can start at sampling rate f;, 
j + 0. 

In order to calculate the values of P;,7 = 0,1, --- , k, it is necessary to in- 
troduce some definitions. P; has already been defined as the probability that, 
for a “randomly chosen” m, x», is the result of a choice from a unit of size fj’. 
Let a prime attached to the P;’s denote the probability that for a “randomly 
chosen” m, xm is the result of a choice from a unit of size f;', and z,_, = 0 
while sampling from a unit of size fj-1 or tm—1 = 1 while sampling from a unit 
of size fia1, 7 = 1, 2,---, k& — 1. Po is the probability that for a “randomly 
chosen” m, x» is the result of a choice from a unit of size fo’ = land z,, = 1 
while sampling at rate fo or f;. Py is the probability that for a “ran- 
domly chosen” m, x» is the result of a choice from a unit of size fz’ and 2m, = 
0 while sampling from a unit of size f;-1 . In other words, P; denotes the prob- 
ability of beginning sampling at level 7. 


? The authors are indebted to Professor Samuel Karlin for pointing out the applicability 
of Chung’s paper. 
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Then we may write 
Po = Pip +opt+dpt-:- +4 "pl 
+ Pilp+pqt+q¢pt+-::: + gp) 
Po = Poll — g'*] + Pill — 9" 
=Pil+qt¢t¢@t+---+ 47] 


- p[t=o} 
Pp 


Pog” + Pip +qptere + q'*"'p] 
1 = Pog + Pill — q*] 
= P\[l +qtq+---+ qh] 


(7) ep pg 
‘Fy ia P, ja | 
P 


In a similar manner we get 


(9) 


(10) 
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Solving equations (4)—(15) for Po we then find Py, P:, --+ , Py since each is a 


. , 
multiple of Po , namely 


“Towa ae a 
q'gq? - ey 
“a= —e- (A — ¢*-)) 


The AOQ function can be written as 


(16) AOQ = p(l — F) = p 


Substituting all the values we get 


(2 )( q’° ) ¢ ) q°q’ 
a eS BA cel eneah seme 
i\fi ;~ g*) 2 (1 — g**)(1 — q*) 
qq" ial q*- 
ae ae ae Se ot ae 
(16a) AOQ =p oe i ee ees =. See Se PS 
ees ee ee 
fa - ql - q") | 


1 q'°q 


ae = ri - oe (1 — git- :) | 


Now consider MLP, i.e., f; = f’; 4; = i(j7 # k) 


then 
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Let z = (1/f)(q'/1 — q’), then 


( 2 k 2 ky) 
7 lz+2 +--+ +2] — [Yez) + (fz) + --- + (fz) 
(17a) AOQ = a ae 


or 


. fhe hin f) fi — oy 
(17b) AO0Q = pe oon aes Ge (Gi) f 

4. Monotonicity of the AOQ and the AOQL.’ It is intuitively apparent that 
in MLP, the average fraction inspected (AFI), for a fixed process average, 
should decrease as k increases or equivalently that the AOQ, for a fixed process 
average increase since AOQ = p(1 — F) where F is the AFT. It is also apparent 
that this result holds more generally than for MLP. 

Let us return to the general model of Section 3. A particular sampling plan 
attaches an integer from the sequence 1, fi’, fo’, ---, fi (1 < fj’ < fia) to 
each member of a given sample point or sequence, (2 , x2, --- ). If we can show 
that for every point in sample space when M is sufficiently large, oe for 
the k level plan is less than }>*_, f;; for the k + 1 level plan, then we will have 
shown that F~' increases monotonely with k in the above sense. 

But look at the sequences of f;’s for a given (% , 2, --- ) in the k and k + 1 
level plans. The second is the same as the first until the first time f;’ appears 
i, times in succession. Then at the next step, f;' changes to fis: in the second 
sequence. As soon as a defective is observed (z,, = 1) for the first’ subsequent 


time, instead of f;_, appearing (as in the first sequence), we use f;' in the second, 
etc. The important conclusion is that at every step the f;* in the second se- 
quence is greater than or equal that of the first sequence and eventually (with 
probability 1) some strictly greater relationship will appear since the proba- 
bility of sampling at rate f, 4; is greater than zero. Moreover, the AOQL must 
then monotonically increase with increasing k, and the minimum AFI mono- 
tonically decrease with increasing k. 


5. Contours of constant AOQL for fixed k.‘ For any fixed /, it is possible to 
get AOQL contours paralleling those given by Dodge for k = 1. However, 
getting the AOQL as an explicit function of k, f, and 7, in order to obtain contours 
of constant AOQL appears mathematically intractable. Moreover, the use of 
computational methods is tedious. Even for k = 2, it is expeditious to use an 
electronic digital computer to obtain contours of constant AOQL. Nevertheless, 
it is possible to obtain contours of constant AOQL for / infinite and we now 


3 The authors are indebted to the referee for pointing out this proof for the general 
Multi-Level Plan. The authors, in the original manuscript, only proved monotonicity for 
the MLP plan. 

4 The authors’ conjecture that MLP guarantees an AOQL (different from the one pre- 
sented in this paper) whether or not the process is in a state of statistical control. The proof 
should follow the method presented in [4]. 
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proceed to derive them. While their use in MLP may not at first appear realistic, 
they will, together with Dodge’s contours, at least provide some guides and 
insight for k = 2, 3,4, .... 


6. AOQL contours for an infinite number of sampling levels. Refer to (17) 


and let k approach infinity. First assume z < 1, then certainly (fz) < 1. Thus for 
infinite k we get 


2 
(18) aog = 9 - [25], 
Now take z > 1, then for infinite k 
(19) AOQ = p. 


Now z > 1 is equivalent to p < 1 — (f/1 + f)"". Thus when p < 1 — 
(f/1 + f)"* the derivative of AOQ with respect to p is positive, in fact it is al- 
ways 1. If the derivative of AOQ with respect to p for p = 1 — (f/1 + f)"" 
is always negative, then the AOQL must occur when p = 1 — (f/1 + f)'" since 
the AOQ is a continuous function in this range. Let us look at d/dp(AOQ) for 
p= 1— (f/1 +f)’ or equivalently z < 1. For k infinite, 


_1-ff pi—p)’ ], 
mo 40g = *5-4[ 2 


foo «(579 


fH = 2 — BIG ~ py — pit =o) 1 — 2 — pial — a 
{l — 2(1 — p)P ‘ 


Since (1 — f)/f and the denominator inside the braces are always positive we 
turn our attention to the numerator inside the braces. This reduces to 


(21) q’ {q(1 — 29°) — ip} 


(20) 


where the terms in the braces are of interest since g*~ is always positive. Now 
the range of interest for p can be writtenas p = 1 — e(f/1 + f)’* where « lies 
between one and zero. Substituting we get 


@ [() Ib-*(a)]-*b ee) ] 


F NPE: xe Od Dg toss 
i (4) [fee +s] i 


It can now be demonstrated that the maximum value of the first term of (22a) 
is less than 7 which shows the derivative is always negative. The greatest value 
of e(f/1 + f)"* is (4) since f < 4(1/f is integral); the greatest value 
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of (1 — f(2e’ — 1))/(1 + f) is one, thus the first term cannot exceed 
(4)""(1 + 4), but 

3)" +i) <i 
since this leads to 

(1+ 1/i)' <3 


and (1 + 1/7)’ $ e = 2.71828. . . . Moreover, since the derivative of AOQ 
with respect to p is zero when p = 1, the AOQ function for an infinite number 
of levels may be sketched as follows: 


A0Q 


Thus the AOQL occurs at p = 1 — (f/(1 + f))" and is also equal to that 
value. This yields the explicit relationship 


~ _ _ (1 — AOQL)' 
a i= 7 @— A0qby" 


It is now quite easy to plot contours of constant AOQL for an infinite number 
of levels and this has been done in Fig. 1 which also contains the same contours 


for k = 1. For k = 1, f represents the sampling rate; for k greater than one, f 
represents the initial sampling rate. 


7. AOQL contours for MLP when k = 2. Putting k = 2 in (17b) we get 


+ _ pp fet2Ztyh\ 
(24) AOQ = pil nrteat yl or 


9Aq = $ —_ se Me ge Na oe 
(24a) AOQ = pq'(1 — f) tp — gor —f)+ wa —ft+ zt: 
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Taking the derivative of AOQ with respect to p and setting it equal to zero we 
get after some straightforward manipulation 
IF -f+EUAP MA — df — £EB + 0) + 2] 

(25) + @tif + if] + P72 + 21) — 2") + g'[—20"| 

+ gf + i] — if = 0 
We now desire to obtain the sixteen specified contours of constant AOQL for 
k = 2 already obtained by Dodge for k = 1 and just obtained in this paper 
for k infinite. While (24a) and (25) uniquely determine the specified contours 
their expeditious computation requires some planning. Any pair (f, 7) deter- 
mines a unique value of g given by (25). When this value of g and pair (f, 7) 
are substituted in (24a), an AOQL is obtained. 

However, the problem is to find curves of constant AOQL, e.g., AOQL = .10 
(ten per cent). A point (f, 7) on this curve was found as follows. For any given 
value of 7, four points [(f°, 7) r = 1, 2, 3, 4] were chosen lying between the 
k = 1 and infinite k contours for this AOQL. Each of these points yielded an 
AOQL value. These points were chosen in such a manner that the desired 
AOQL = .10 was included between the smallest and largest of the four AOQL’s. 
By an Aitken interpolation the pair (f, 7) corresponding to the desired AOQL 
value was obtained. Any number of points on the specified contour can be ob- 
tained in this manner. We have lightly passed over the tedious job of evaluat- 
ing g by (25) and then the AOQL by (24a) for any fixed pair (f, 7). Actually, 
an electronic digital computer was employed for these two steps. The contours 
of constant AOQL for k = 2 were produced in this way and are given in Fig. 2. 


Also, some of these are contrasted with contours for k = 1 and k = @ in 
Fig. 3. 


8. Contours of constant AOQL when 2 < k < o. The computation of con- 
tours of constant AOQL for k > 2 soon becomes forbidding if the method for 
obtaining contours of constant AOQL for k = 2 is applied. However, it is 
interesting and fruitful to explore the kind of interpolation necessary to repro- 
duce the k = 2 contours from knowledge of the one sampling level and infinite 
sampling level contours. For k = 1 and k infinite we can explicitly write f in 
terms of 7 and AOQL, namely 


(1 — A)é 


(26) f= Sa . 
a - ay + (1 +2) de 


1—A 
and 


(1 — A)‘= 


(27) fa or a 


where A is the AOQL and the subscripts on the f’s and 7’s refer to the number 
of levels used. 
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TABLE I 
Comparison between exact (fx) and approximate (f,) values of f fork = 2 


AOQL i fr tA fe—fa 


10 .0906 | .0037 


-0850 
-0355 
.0176 


.3193 
. 1370 
-0431 
-0191 


.1703 
0670 -0665 
.0210 -0209 





.1830 .1779 
.0740 .0702 
.0238 .0239 





1850 | .1801 
180 | .0690 | .0658 
255 .0270 .0259 


225 .1799 -1749 
350 .0740 -0708 
510 .0270 .0316 


If for fixed i = 7; = 72 = 7, we write 


* fe = fall — (3)"] + AUG)" 


we get a point (fe, 72) which falls almost exactly on the contour of constant 
AOQL for k = 2. This is demonstrated in Table I. In other words, harmonic 





MULTI-LEVEL CONTINUOUS SAMPLING PLANS 701 


cube root interpolation is appropriate, that is, for fixed AOQL and 7, the set 
fi, fe, fe is proportional to 1, (})'*, 0. This was of course discovered by trial 
and error but it also presents a reasonable way for obtaining any of the sixteen 
specified contours of constant AOQL for any fixed k by using the explicit known 
values for k = 1 and k infinite together with the assumption that f, , f, , f. is 
proportional to 1, (1/k)"*, 0; or 


tent)" oGiT. 


9. Choice of a MLP plan. Assuming that harmonic cube root interpolation is 
a satisfactory method for obtaining contours of constant AOQL for any fixed 
k, there still remains the task of defining valid and reasonable criteria to be 
employed in the selection of a MLP plan; for, given a process average, an in- 
finite number of such plans exist which can guarantee the attainment of any 
specified AOQL. Contract specifications, administrative considerations, or 
psychological grounds can impose a lower bound on the amount of inspection 
or an upper bound on the number of sampling levels and thus curtail the total 
number of possible plans. A lower bound on the amount of inspection may also 
be required to quickly detect the malfunctioning of the production process. 
Also, it is evident from Figs. 1, 2, and 3 that Dodge in his single level plan con- 
siders f > 1/2 and f < 1/100 as unrealistic, and the authors in MLP consider 
the same region unrealistic for initial sampling rates and thus, large groups of 
plans are ignored. 

In addition there are cost considerations which our continuous inspection 
scheme must consider and these will also influence the choice of plans. We will 
now discuss two types of cost criteria and their effect on the choice of a MLP 
plan. These criteria are (a) minimum AFI; and (b) local stability which we 
specifically define as maintained as long as 


(30) P\dy > NA} Sa 


where dy is the number of defects remaining in a sequence of N items (N large) 
which have gone through the inspection process, A is the desired AOQL, and 
a is the tolerated risk. This definition of local stability is a quantification of 
the notion expressed in [2]. While only the single sampling level and the infinite 
sampling plans will be explicitly analyzed, some implications will remain for 
any MLP plan. 

We now turn to the AFI functions for k = 1 and k infinite. For simplifica- 
tion write AFI = F and we get for k = 1 
priors lacks MRA Cis 
a M+ - fd — pp" 


where f; is defined by (26). Thus 


e224) 
(32) F, = ——— Lae 


1— A\" 1\" ey me ey 
(i= 4) +(1+2) a+a(;4,) 
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For k infinite, we get 


ja - 1] (1 — p)‘- -| 
(33) F. = ! Te 1 — 2(1 — p)‘e]’ 
\0, 


where f.. is defined by (27). Thus 


1— A\‘- 
ters, “- 


—4)* — 2(1 — A)** 


(34) 2% F for p > 


4 forp < A. 


When the process average, p, is less than or equal to the desired AOQL, the 
question of minimum AF‘ is easily resolved in favor of the infinite sampling 
level scheme. On the other hand, when p exceeds the AOQL, it is evident from 
(32) and (34) that /; can be made smaller than F., within the ranges of 7; and 
i dictated by a specific choice of A. Table II gives some numerical illustra- 
tions. 


TABLE II 
Minimum values of F; and F., for selected values of process average 
and AOQL (p > A) 


A is oh ae lence tuasd Fe 





10 ; hae 05 34 | 38 ! .69 
10 ; 7 ie ae | 4 a 1 ee 
.02 | 97 | 68 , psig 33 | .67 
.02 eo 47 xi ; .50 .86 
005 |  .008 330 | 269 ’ a a or. | a 
.0005 | .0008 3331 | 2694 : bi tl 26: «|.» 4B 


Let us now examine the single sampling level and infinite sampling level 
plans for local stability. For a sequence of N items (N large, p(1 — F) small), 
dy can be approximated by a Poisson distribution with mean Np(1 — F). Thus 
from (30) and the normal approximation to the Poisson we get 


(35) NA — Np(l — F) , 


[Npa — Pye = Be 


where K, is the (1 — a)th percentile of the normal distribution with zero mean 
and unit variance. Solving the equality in (35) for p(1 — " = AOQ we find 


(36) pli — F) = 2NA+ Ke neetee: + Kt 
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but since p(1 — F) Ss A, the positive square root must be discarded. The solu- 
tion to the inequality in (35) is 

K*. Ki , AKi} 
37 1-FP)sAt+—-/—4+-— |] = C(A,N, a). 
a) pam) sa + Be -[ Bey ABS] — city, a) 


For k = 1, we obtain 


09 soni -nieefi (NGA) T. 


When p S C(A, N, a), (37) is satisfied for all values of 7; and thus local sta- 
bility is guaranteed by all plans for k = 1. For C(A, N, a) < p S A there exists 
a value i;, given p and A, such that for all 7; < 77 local stability is maintained. 


When p > A, then there exists a value i}*, given p and A, such that for all 
i; = 71” local stability is maintained. Thus, given any values for p and A, it 
is always possible to find plans which yield local stability when k = 1; and 
when quality is exceptionally good all single sampling level plans have this 
property. This is not surprising since the Dodge plan represents the tightest 
inspection plan of all MLP plans. 


For the infinite sampling level plan, we obtain 


p(l — F.) = p whenp < A 


39) AOQ, = — p\*[1—2(1 — A)* 
(39) Q pil — Fa) = p(j 4 [pawn 4) | for p 2 A. 


1-—A 1 — 2(1 — p)‘- 

When p S C(A, N, a), (37) is satisfied for all values of 7., and local stability is 
guaranteed for all infinite sampling level plans. For A > p > C(A, N, a), (37) 
is never satisfied and local stability is never maintained. This is also true for 
p = A. When p > A, there exists a value iz , given p and A, such that for all 
ia 2 is local stability is maintained. Thus, as in the Dodge plan, when quality 
is exceptionally good, all infinite sampling level plans have the desired prop- 
erty. On the other hand, when quality is good but hovers just short of the 
AOQL, local stability cannot be maintained. However, when quality exceeds 
the desired AOQL, it is possible to find some infinite sampling level plans which 
will maintain local stability. 

When quality is exceptionally good, i.e., p S C(A, N, a), then the infinite 
sampling level plan guarantees local stability and has minimum AFI. It also 
seems plausible that if a choice between the Dodge plan and, say, k = 3 is de- 
sired, then the decision should be in favor of k = 3 since it will guarantee local 
stability and a smaller minimum AFI. However, if C(A, N, a) < p S A, the 
choice between the Dodge plan and infinite k is not easily resolved for while 
the latter has minimum AFT, it does not have local stability. If quality is poor, 
p > A, then the Dodge plan is to be preferred. 
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JOINT DISTRIBUTIONS OF TIME INTERVALS FOR THE OCCURRENCE 
OF SUCCESSIVE ACCIDENTS IN A GENERALIZED 
POLYA SCHEME! 


By Grace E. Bares 
Mount Holyoke College and University of California 


0. Summary. The main body of this paper has two rather distinct parts, the 
first part (Section 2) containing the derivation of an explicit representation for 
the joint distribution of accident times in quite a general situation, and the re- 
mainder of the paper dealing with a very special case which leads to a discussion 
of testing the hypothesis that a distribution is uniform over (0, 1) against alter- 
natives which are exponential truncated to (0, 1). The second part may be read 
profitably with little reference to the probabilistic arguments of Section 2. 


1. Introduction. In [1] a comparison was made between two models often used 
in studies of accident proneness. The first model, due to Greenwood, Yule, and 
Newbold [2], [3], postulates variability among the individuals of a population 
with respect to accident proneness, assumes that previous accidents do not 
change the probabilities of future accidents and that experience gained in the 
particular occupation giving rise to the risk of these accidents does not. modify 
these probabilities. The combined term ‘‘mixture-no contagion-no time effect” 
model was used to symbolize this first scheme. The second model, due to Polya 
[4], postulates identity of the individuals with respect to accident proneness, pos- 
sible presence of contagion, and possible effect of experience gained since enter- 
ing the particular occupation. 

Using in [1] the scheme of Polya in a slightly more generalized form, it was 
shown that the multivariate distribution of the number of accidents incurred in 
several successive periods of observation, as soon as two or more periods of ob- 
servation were used, was distinguishable from the corresponding distribution 
implied by the mixture-no contagion-no time effect scheme, barring an excep- 
tional particular case. 

The last section of [1] considered the same problem of distinguishing between 
the two models when the random variables used were the time intervals be- 
tween successive accidents incurred by an individual in one period of observa- 
tion. In formulating this scheme it was found possible to liberalize a little the 
scheme of Polya by not insisting that the contagion be a linear type. This ap- 
proach was applied only to the case of individuals who, during the period of 
observation, sustain exactly one accident. 
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The present paper applies the outline given in Section 10 of [1] to the general 
case in which one makes use of data relating to several groups, say G;, 7 = 1, 
2,---, k, of individuals, each group composed of individuals who sustain the 
same number, n; , of accidents in the period of observation. 

Section 2 of the present paper establishes, for an individual who sustains n 
successive accidents in the period of observation, the joint distribution of the 
time intervals measured from the beginning of the period to the occurrence of 
each accident. Some interesting special cases of this distribution are enumerated 
and the remainder of the paper is concerned with a comparison of one of these 
special cases with the case of no mixture-no contagion-no time effect. The spe- 
cial case selected is one implying that there is no mixture-no time effect and 
that contagion, when present, is of the linear type. One advantage of dealing 
with the case of linear contagion is that it is not necessary in this case to assume 
that all the individuals under consideration have sustained the same number of 
accidents prior to the period of observation. 

In section 3, preparatory to constructing tests of the hypothesis of no-mix- 
ture-no contagion-no time effect versus alternatives of no mixture-linear conta- 
gion-no time effect, the distribution of the mean of the random variables 7; , 
representing the length of the time interval from the start of a unit period of ob- 
servation to the occurrence of the 7-th accident, is determined under the hypoth- 
esis tested and under the alternative hypotheses. The distribution of the mean 
of the 7,’s under the hypothesis tested is the well-known distribution of the mean 
of n completely independent random variables, each uniformly distributed on 
(0, 1). 

Section 4 considers the construction of these tests. It is found that there are 
uniformly most powerful tests of the hypothesis of no mixture-no time effect-no 
contagion against each of the classes of one-sided alternatives (termed “positive 
linear contagion” and “negative linear contagion’’, respectively) and a uni- 
formly most powerful unbiased test of the original hypothesis versus the set of 
alternatives, no mixture-no time effect-linear contagion. Given accident data, 
including times of occurrence of the accidents relating to several groups of indi- 
viduals, each group containing individuals who sustain the same number of 
accidents in the period of observation, the statistic required for the tests is the 
grand mean of all the time intervals for all the individuals. 

In the last section we treat the problem of computing the power of the uni- 
formly most powerful unbiased test. The exact power function is obtained ex- 
plicitly from the distributions of Section 2. However, since actual computation 
of this power is very tedious, an approximation to the power function is de- 
sirable. This approximation is effected in two stages: first, the critical region 
boundaries for a specified test level are approximated by using the normal 
approximation to the distribution of the mean of the time intervals under the 
hypothesis of no mixture-no time effect-no contagion—i.e., to Laplace’s dis- 
tribution of the mean of n completely independent random variables uniform in 
(0, 1). Then the Central Limit Theorem is applied to the distribution of the mean 
in the set of alternative hypotheses to find the approximate power of this test. 
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2. Joint distribution of time intervals for n successive accidents sustained by 
an individual in one period of observation. As in section 10 of [1] we consider an 
individual J who, from the moment ¢ = 0 on, is exposed to the risk of nonfatal 
accidents of a particular kind. For this individual we shall consider probabilities 
Pm,(T;, Tz) defined as follows,.for0 S 7; S T2:Pm,,(T;, T2) is the conditional 
probability that, during the time interval (7; , T:), the individual J will incur 
exactly n accidents, given that at time 7; or before he had sustained exactly m 
accidents. We impose on the probabilities P,,,,,.(7, , T:) the three postulates given 
below. The totality of these three postulates we shall describe as the Polya con- 
tagious scheme. 

Postu.aTE 1. If T,; — 71, then all the probabilities P».,,(T:, T2) converge to 
limits Pm.(T: , T:). More specifically, 


(1) Pno(T:, 71) = 1 for every m, 
and consequently, 
(2) Pmn(T1, T:) = 0 for every m, and for n= 1. 


PostuLatTe 2. The probabilities P m.(T1 , T2) depend on the number m of accidents 
sustained up to and including the moment T, and also on the value of T; , but not 
on the moments at which these previous accidents occurred. 

PostuLaTE 3. At least at T, = T;, the probabilities P»,.(T: , T2) are differenti- 
able with respect to T; , and specifically, 





—\m ocd 1 
Ta of, ifn = 0 
0 
(3) an P., (Ti T>) - Am : 
, , ’ = 2 f i 
OT’, T2=T 1+ ol, un 1 
0 ifn> Il 
where v = Oand Xo, 1, -** Am, *** are arbitrary positive numbers, with possibly 


Amit = 0, k = M, for some positive integer M. 

In applying the probabilistic scheme formulated above, one may consider the 
probability space as the accident histories of a large population of individuals. 
Then, at the outset, the conditions above require that the \,,’s and »’s be the 
same for all individuals. It will be pointed out later, however, that the tests of 
the hypothesis of no mixture-no time effect-no contagion in the special set of 
alternatives of no mixture-no time effect-linear contagion derived in this paper, 
imply only that v be zero for each individual and that each individual have the 
same constant increment ¥ = Ami+1 — Amys in the sequence of his A,,’s. 

Following the usual procedure ((5], Chapter 17) one obtains the differential 
equations 





0 i —)\m 

(4) aT Pa o(T: ’ T's) po 1+ T, Pa o(T >, T:), 

(=) 2 Pan(Ti, 7) = —2™* Pan(Tr, Ts) + Oet P..(Tr, Tr) 
FR Sr: L + oh Or bite peg PTT 88 1 


for n = 1. 
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Under initial conditions given by Postulate 1, the solution of (4) is 


_ (i+rTi\"”" 
(6) Pao(Ti ’ T2) = (2) ’ 


and the solution (Cf., [6] pp. 406-407 where the general solution of a similar set 
of differential equations is given) of (5) when all the \’s are unequal and vy ¥ 0 is 


n 1 \ Am +k/? 
(7) Pail? ? T2) _ (— 1)"Xm Am+1 7ey Anal b> ¢ tot) Db; 
" k=o \l + vT> 


for n = 1, with 


n 


(8) Den = [I (mee — Aegis) 


j=) 
jJgk 
Since the set of equations (4) and (5) may be solved recursively for the P»., 
(T,, T2), the solutions (6) and (7) are unique. From the familiar formulas for 
solving linear differential equations of this type it is easily seen that the solutions, 
Pm»(T;,, Tz) are non-negative. Furthermore, it is easily verified that 


(9) De Pmn(T Tr) € 1. 

If the system of equations (4), (5), is finite (i.e., An, = Ofor k 2 M) then 
Pn»(T:, T:) = Oforn > M and the equality holds in (9) so that the solutions 
(6) and (7) for 0 S n S M, form a proper probability distribution. 

In the case of an infinite set of equations (4), (5), it may happen that (9) is 
a true inequality. This type of situation has been discussed by Feller (({5], pp. 
369-371) as implying nonzero probability of the occurrence of an infinite number 
of events in the finite period. Feller derives a necessary and sufficient condition 
for the equality in (9) to hold. Using equations (4) and (5), Feller’s argument 
goes through almost verbatim to yield the same result—namely, a necessary 
and sufficient condition for the equality in (9) to hold is that the series >. ~~o a. 
diverges. 

In the application of the distribution of P,,,,.’s made in the remainder of this 
paper, it will be seen that we have either a finite set of equations (4) and (5), or, 
in the infinite case, the divergence condition on the A»4+,’s is fulfilled. 

Forms obtained from solutions (6) and (7) by a passage to the limit as v — 0 
and/or as some or all of the \’s become equal, may be shown by direct verifica- 
tion also to satisfy (4) and (5), a fact which will be used later in this paper. 

It is clear that P.o(7T;, T2) is a decreasing function of \,, . Furthermore, if 
all the \’s have the same value, the model implies the absence of contagion in 
the accidents. As in [1], if the \’s form an increasing sequence A» < Ai < «+: < 
Am < +++, we use the term “regular positive contagion”, meaning that the more 
accidents the individual had in the past, the more intense his risk of accidents 
in the future. If the \’s form a monotonic decreasing sequence, we speak of “‘regu- 
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lar negative contagion’’—past accidents “teach”’ the individual how better to 
avoid accidents in the future. We use the term “irregular contagion” for a non- 
monotone sequence of \’s. The constant v in the probabilities P,,,,.(7; , T:) is 
termed “time effect” in the model and may be attributed to the effect of pre- 
vious experience  Bained by the individual in the occupation which gives rise to 
the risks of these particular accidents. From (3) it may be seen that with in- 
creasing 7, and X\,, > 0 the rate of increase of the probabilities of further ac- 
cidents in (7; , Tz) is slowed down by the presence of v > 0. When v = 0 there 
is an absence of time effect. 

We now assume that an individual J is observed for a unit of time from 7’; to 
T, + 1. The first problem considered is that of finding the joint distribution of 
random variables 7; defined as follows: 7; is the time from 7, to the occurrence 
of the i-th accident for the individual J. We are concerned with the joint dis- 
tribution of 7; < tz < --: < 7, conditional on the occurrence of n accidents in 
the period of observation and m accidents previous to this period. 

In order to solve the problem under consideration, we now compute the prob- 
ability of the following two events, each of which is to be understood as condi- 
tioned by the occurrence of exactly m accidents at or before 7; : 

(i) J ineurs exactly n accidents in (7; , T; + 1). 

(ii) J incurs n accidents in (J; , T; + 1) and the random variables 7; satisfy 
conditions r; S t;,7 = 1,2,---,nwithO <t << +--+ <t, <1. 

The probability of (i) is P»,.(7: , T: + 1), and from (7) we have 


n Am +&/ » 
(10) Pa.r(T1 ’ Ti + 1) = (—1)"Am Am+1 —o Nera om ( : ) Dp; 


a+ yp, 


k=0 


where, for convenience, 
(11) 


The probability of (ii) is 


(12) 2 AE Petts wicuntls hes Vat tow) 


{Jn} kK=0 


with é) = 0 and t,4; = 1, and where the sum is over all sequences {J,} such that 
Jo = 0, Jn = Jnis = n, With {J,} a non-decreasing sequence of integers, each 
J. = kfork = 0,1,---,n. 

The joint density function of the 7,’s conditioned by the occurrence of m pre- 
vious accidents and n accidents in (7; , T; + 1) will be the n-th partial deriva- 
tive with respect to t,, ta, ---, t, of the expression in (12), divided by P,,., 
(T,, T; + 1) given in (10). Of the terms in the sum in (12), the only term which 
does not contribute zero to this density is 


n—l 
(13) Il Prope a(Ti + te, Ti + trys) Pmono(T: + tr, Ti + 1), 
k=0 
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i.e., the term involving the sequence of J;’s with J, = k,0 S k S n. This may 
be seen as follows: 

Consider the term in the sum in (12) for any one of the possible sequences of 
J;’s for which J,-, = n. Then the first difference with respect to ¢, of this term 
is identically zero, since, for 0 < « < 1 — ¢, we have for this difference: 


n—2 
(14) i Prt Sptugr—ve(Ts + te» Tr + tess) 


' {Pm en 0( Ts + ta—1 > T; + t,, + €)Pmino(T1 + t, + €, T; + 1) 
—Pmrino(T1 + ’ T; + tr) Pmino(T1 + tn ? Ti - 1)} 


where the expression in the braces is certainly zero. Hence only terms of the sum 
in (12) for which the sequence of J;’s has J,.1 = n — 1 will contribute to the 
density function. 

Next, considering sequences of J;’s with J,.1 = n — 1, we take the class of 
those for which J,-2: = n — 1. But for each such J; the corresponding term in 
the sum in (12) will have second difference with respect to ¢, , t,-1 identically 
zero and hence in order for a sequence of J;’s to contribute non-zero density, 
we must have also J,.. = n — 2. Continuing inductively, we find that unless 
the particular sequence of J;’s is chosen for which J; = k,0 S k S n, the n-th 
partial derivative in (12) with respect tot, ,t-1,°°*, his 0. 

Differentiating (13) with respect to t, , tat, --- , 4: and dividing by (10), we 
have the density function 


(15) Drs.vesce+.tall ple, ***5 tn | Wm ’ Vit © Wmin—l ’ v) 


TT (1 + t,/a)%es8-9" 
= OC Kt: 
a” D, (—1)" (1 + v/a) ner aren Re 

with 

(16) Rin = {Wm + Wmar Hoots + Wmse) mgt Hoots + mses) 
85 Wmgh—2 + Vmnte—)Wm+i— mse Wmngk + Wm+e41) 
9 Wmak + ymiett to +++ + Ymin-r)}, 

and 

(17) Wmtk = Ameket — Amis; k=0,1,---,n-— 1. 


The following special cases are of interest and were obtained by considering 
the limiting forms of the density function (15). The equivalence of these results 
to those obtained by a passage to the limit before differentiation may be verified 
directly. 
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Case (i) No mixture-possible time effect-no contagion: 


(18) Dry.te.+++ stat ’ he, Lei tn | vi ~ ¥ “ 0; v) 


a? [es lt (1 + vte/a)7 


a log. (1 + v/a)} imi 
Case (ii) No mixture-no time effect-possible contagion : 


(19) Dry.e,>+t0th » te , - ee tn | Wm ; aes » Wm4n—t = = 0) 


exp (= Ym+k—1 u) 


dX (—1)" exp (Wms + °° + Vmun—-DRit 


Case (iii) No mixture-possible time-effect-linear contagion : 


(20) Des an,=29,0,08 , te, oe ee tn | Ym - Wm+1 a . Vmin-1 - v; v) 


pene ae p yt,/a))¥" 
~ an{(L + (v/a))¥” — ip ll (1 + (vt/a)) : 


Case (iv) No mixture-no time effect-linear contagion : 


21) Dry sre.-senllay bey *** pte | Vs = Vix = 0) = ni(5 z :) exp vdt 


Case (v) No mixture-no time-effect-no contagion: 
Diva 0,2 male ; to age », ie | Wi = 7 = Q; v= 0) == n’. 


We not that the joint density function (15) takes the form of that in (22) also 
when all the y; are equal toy andy = 7, whether or not y = 0. In this case then, 
the presence or absence of contagion is unidentifiable. 

For the remainder of the present paper we shall be concerned with a com- 
parison of the models implied by (21) and (22), so that contagion is absent if 
and only if Y = 0. We first note some of the implications arising from the model 
implied by (21). 

It is clear from the model that for y; = y, the contagion is of the regular posi- 
tive type for y > 0 and is regular negative contagion for y < 0. Furthermore, it 
is obvious that this condition on the y; implies that the contagion is linear; specifi- 
cally, 


(23) Amik = Am + hy. 


To see the effect of this linearity more clearly, we return to (6) and see that 
for vy = 0, 


(24) Pao(T1, T2) = oer 
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so that, in the unit time interval, (7, , 7; + 1), we have 


(25) ee” - Poresio(T1, Ti, + 1) a 
Prto(T1, Ti, + 1) , 
That is, e’ is the factor by which the probability of avoiding accidents in a 
unit time interval, for an individual J who had previously sustained m + k ac- 
cidents, must be multiplied in order to find the probability of his avoiding ac- 
cidents in this time interval had he sustained m + k + 1 previous accidents. 
Thus, for example, a value of y of about —0.7 means that an increase of 1 in 
the number of previous accidents would double his chances of avoiding further 
accidents in this period, while a y of 0.7 means that a similar increase in the 
number of previous accidents would halve his chances of avoiding further ac- 
cidents in the period. The actual number of previous accidents sustained by the 
individual, of course, still determines his probability of avoiding accidents in 
the period of observation, but the condition that all the y; be equal implies that 
the relative increase or decrease of this probability depends only on the increase 
in the number of previous accidents. One important consequence of the condi- 
tion that all the y,’s be equal (that is, that the contagion be linear) is that in 
testing the hypothesis of no contagion versus contagion of this type it is not neces- 
sary to assume that all the individuals under observation have incurred the 
same number of accidents previously or that \,, is the same for each individual. 
We now examine the probability distribution given by the P,.,,(7:, T:) in 
the case under consideration of no mixture-no time effect-linear contagion. 
Returning to the differential equations (4) and (5), with »y = 0 and Amsx = Am + 
kp, we see that in the case of positive linear contagion (YW > 0) the system of 
equations may be infinite. In this case, however, the series of \’s clearly diverges 
so that equality in (9) holds. In the case of negative linear contagion (YW < 0) 
it is evident that the condition that the \,.4,’s be non-negative places a restriction 
on n, namely 


(26) nS —rn/¥, y <0. 


Thus the system of differential equations in (4), (5) must be finite when y < 0 
and the P,,,,,’s form a proper probability distribution. 

In the case of negative linear contagion, if one knew \,, (or could conjecture an 
upper bound for \,,), it would be possible to reject at the outset certain alterna- 
tive values of y—-those such that n > —),,/y. Thus the particular model of 
linear contagion may be criticized in that it places this added restriction on the 
degree of negative contagion permissible in the model. In the tests of the next 
sections it is assumed that one has at hand only the accident data in the period 
of observation, with no knowledge of the number of previous accidents or of the 
Am’s. Subject to the limitations of the model, we are then testing the hypothesis 
of no contagion versus linear contagion. 
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3. Joint density functions for the means of the n random variables 7; , having 
density functions (21) and (22). In Section 4 we construct tests of the hypothesis 
of no contagion in the class of admissible hypotheses each implying no mixture-no 
time effect-linear contagion. The statistic needed for these tests is found to be 
the mean of the time intervals, and it is for this reason that the present section 
is needed. 

We rewrite formulas (22) and (21) here for convenience: 


(27) Densch taste ; te, 7 fe & | Wi = y = y= 0) = 
(28)  Pry.ro.e+sra(tr, be, -** ta [Wi = ¥;3 » = O) 


=n (5¥ V exp y Dt, O<h<kh<-- << 1. 
ev — 1 k=l 


From the sampling theory of order statistics (cf. [7] p. 90) we note that the 
unordered 7; in (27) are distributed as a random sample from the uniform dis- 
tribution on (0, 1), which we shall denote by p(t | ¥ = 0), and that the unordered 
7; in (28) are distributed as a random sample from the distribution with density 
on (0, 1) given by 


‘ ' a pe" 
(29) pil I vy) = a-— 1 . 
The density in (29) is equal to that of the exponential function, f(t) = —ye"', 
y < 0, truncated to (0, 1). 

The distribution of the mean of n independent random variables, each with 
uniform distribution on (0, 1), is well known. Laplace [8] derived this distribu- 
tion in his Mémoire on the mean inclination of the orbits of comets. (For a more 
accessible reference, see [9], Vol. 1, p. 244). Writing (+ |y = 0) for the mean of 
the 7; corresponding to (27) and ¢;(u|y = 0) for the characteristic function, 
we have 


(30) g(ul|y =0) = ( } 


(31) E(z|\¥ = 0) =4; o(#|¥ = 0) = 1/(12n), 


2) pltl¥= =D Y - (")e-sm™, ots 
As n increases, the distribution function of the standardized variable 


(33) z = V/12n (# — 4), 


when y = 0, rapidly approaches (cf. [10] p. 245) that of the standardized normal 
variable. 
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If we now write (7!) for the mean of the 7; corresponding to (28), with 
¢:(u |) for the characteristic function, we have 


. . y n etiuind+¥ se ) 
(34) oul y) = (45 :) (— +¥ 

: 1, we —e +1 2 1% — 1)? —we’ 
G5) BW = aaa} CIM Sei 


y ny oo e(itind+¥ py ) 
itl = (S45) & Le (Som ey) 
= ye" , 1 “ —irt : aE 2) ) 
. (5 3 i) = 1 ( ion) 
i. ye" y - 
a (2 pe(t| ¥ = 0). 


4. Tests of the hypothesis of no contagion in the set of admissible hypothe- 
ses each implying no mixture-no time effect-linear contagion. We suppose that 
we have observations on several groups of individuals, the r-th group being 
determined by the integer r, r = 1, 2, 3, --- , of accidents sustained by each 
individual of that group in a unit time interval (7; , 7: + 1). Suppose further 
that we have N, individuals in the r-th group. In this set of observations, if for 
some integer j, there are no individuals who have sustained exactly j accidents, 
then N;; is zero for this integer 7. We define random variables 7;,, in the following 
way: 

Tire is the time from 7; to the 7-th accident for the s-th individual in the r-th 
group;? = 1,---,r;r = 1,2,3,---;8 =1,2,---,N,. 

Let 


(37) N = >orN,. 


That is, let N be the total number of accidents. 

We now make the following assumption: From individual to individual, among 
the >°N, individuals, the accident times are independent. 

Under this assumption, the unordered random variables 7;,, act like N in- 
dependent observations from a parent population that is either uniform on (0, 1) 
if no contagion is present, or has density that of (29) if contagion is present. 
Denoting by {7;} the unordered set of 7;,,’8s, we have 


(38) Pry ({t5} ly pe 0) =1 

and 

(39) Prat} |y) = (+44) exp ¥ > j= (5) , 
where f = (1/N) )* t; . 
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We note that 7 is a sufficient statistic for the distribution of 7;’s, where 
N 
(40) + = (1/N) Dn, 
j=l 


that is, the grand mean of all the time intervals for all the individuals. 

Clearly, then, under the present conditions of no mixture-no time effect-linear 
contagion, we have a uniformly most powerful test of the hypothesis y = 0 
against either (but not both) of the one-sided alternatives y < 0 ory > 0, with 


critical regions of type + < t% or 7 > to, respectively, where ty is determined by 
the level of significance a. 
Furthermore, letting 


S 
oy 


oy y é 1 ) 


(43) ¢ =A+ Bg, 


with 


v 
(41) = 2 log. re n(ttd |Y) = N (J enue ) 


y ey —1 


Y 
That is, [11], we have a uniformly most powerful unbiased (U.M.P.U.) test 
of the hypothesis y = 0 versus alternatives y + 0, having critical region 
(45) 7#<}¢+4 and 77> t+ ee 


where 


(44) A=N (} + ap) and B = 0. 


cot} 
(46) / p(t]y =0)dt=1—a 
i 


ei+ 


and, since the test is U.M.P.U., 


cot} 
(47) [° @-Dptlv = 0) a =o. 


1+} 
From (32), we see that 
(48) pi(t|\y = 0) = p(l — t |p = 0) 


so that condition (47) requires that the test be one with “equal tails’’. 

The writer is grateful to the referee for pointing out that results in a recent 
paper by Lehmann [12] applied to the distribution of time intervals here con- 
sidered, enable one to conclude further that the above uniformly most powerful 
unbiased test is also a uniformly most powerful of all most stringent tests (as 
defined by Wald [13}). 
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Note that the statistic needed for these tests is the grand mean of all the time 
intervals between 7’; and the occurrence of each accident for each of the indi- 
viduals observed. Since N is the total number of accidents, it is clear that an 
N of, say 12, may be obtained by observing 12 individuals, each of whom incurs 
1 accident in the period of observation; or 6 individuals, each of whom incurs 2 
accidents, etc. Consider, for example, one of the sets of data used in Part I of 
the paper [1]. This set of data was taken from the publication [14] of Farmers 
and Chambers and was concerned with the accident records of 166 London bus 
drivers during five successive years of service. The total number of accidents sus- 
tained by these 166 individuals in the five-year period was 1297. Thus, if the 
times of occurrence of these accidents were available, tests of the hypothesis of 
no contagion under the present conditions would have an N of 1297. It is true, 
however, in this example, that the underlying assumption of the independence of 
accident times among the individuals seems unrealistic. 

Note that the manner of construction of the tests of this section require that 
N be fixed, since we are using the distribution of the 7’s, given the total number 
of accidents N. Since N is itself a random variable, the question may be raised 
as to whether we are not losing some of the information in defining our test 
(choosing the critical region) conditioned by the value of N. Furthermore, since 
the N, , the number of individuals having exactly r accidents, may still vary with 
fixed N, subject only to the condition that >, = N, one may wonder about 
possible loss of information in making tests independent of the particular set of 
values of the observed random variables N,. We shall show that in the class of 
tests satisfying the requirement that the critical regions defining the tests be 
similar with respect to the parameters involved, the tests of this section have the 
property, roughly speaking, of using all of the information provided by the data. 
This fact is a consequence of the special form of the frequency function, under 
the null hypothesis y = 0, of the number of accidents sustained by an individual. 

Let X be the random variable which equals the number of accidents sustained 
by an individual in a unit period of observation. Returning to Section 2 of this 
paper and letting Am = Amsi1 = --: = Amin = A, (or by considering the limiting 
form of solutions (6) and (7) with 7, = 7; + 1) we see that the probability of 
an individual sustaining exactly n accidents in a unit time interval when y = 0, 
is given by 


—A, n 
(49) px(n) = cs : n=0,1,2,-:--. 


Now consider the accident data for which the tests of this section were devised. 
We have a set of N accident times in a unit period of observation, with N = 
>N,, involving >,N, = K, say, individuals. For convenience we define 
variables M; as follows: 

M; = the number of accidents sustained by the i-th individual in the unit 
time interval, i = 1, 2,---,K. Then N = nM; and each M; when y = 0 
has a Poisson distribution with unknown mean \; , where \,; is the parameter per- 
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taining to the Poisson frequency function for the i-th individual’s number of 
accidents. 

Hence, under the assumptions of this section, N is a sum of K independent 
Poisson variates and is again a Poisson variate. Then the distribution of N is 
complete (cf. [15]). Furthermore, N is also sufficient for the distribution of 7 
when y = 0, since the unordered 7,’s are known to act, conditionally on N, like 
independent random variables uniformly distributed on (0, 1) so that the dis- 
tribution of 7 is then independent of the parameters \; , \2, --- , Ax . It follows 
from the Lehmann-Scheffé results, [15], that the only similar regions are those of 
Neyman structure with respect to N, i.e., roughly speaking, those for which the 
conditional probability of a point falling therein is equal to a, whatever the value 
of N. 

Furthermore, considering the joint distribution of the M,;,7 = 1, 2, ---, K, 
we note that it is the product distribution of K independent Poisson variates, 
the 7-th variate having unknown mean \,;. The set (M,, M.,--- , Mx) being 
sufficient for the distribution of 7’s when y = 0, we need only show the (bounded) 
completeness of the joint distribution of the M; to apply the Lehmann-Scheffé 
results. Given that 


K K m; 
(50) > = fm, me, «> , mg) (exp > rs) Il eal 
i=l 


mM 1,Me,+** mK i=] (m,)! 


for every choice of \; , A2, --* , Ax (and hence in particular, for the case in which 
all the ,’s are different), this implies that f(m , mz, --- , mx) = 0, since f(m , 


m2, °** , Mx) is the coefficient of AT'AY? --- Ax* in the expansion of 0 in powers 
of Ai, Ax, *-* , Ax. That is, the joint distribution of the M; is complete. By the 
Lehmann-Scheffé results, we need then only work conditionally on the M,’s. 
But since N, = the number of individuals sustaining r accidents, the N,’s are 
then fixed, which shows that the critical regions selected in our tests are in- 
dependent of the particular set of N,’s used subject to the condition }>rN, = N. 

The writer is indebted to the referee for pointing out the above analysis of the 
implications of the tests of this section. 

Finally, it should be noted that the tests of this section apply to accident data 
in which each individual has the same length of time of exposure to accidents, 
which (clearly without loss of generality) we took to be a unit period of observa- 
tion. It may be worthwhile to mention the fact that these tests may be generalized 
to use accident data involving periods of exposure to accidents which vary with 
the individual. 

Given, say, K individuals, the i-th individual incurring n; accidents in an ex- 
posure period of length L,; , and given the times 7;; of occurrence of the j-th ac- 
cident for the 7-th individual (the times measured from the start of the indi- 
vidual’s period of exposure), we may consider normalized random variables o;; 
defined by o;; = 7:;/Li;, t= 1,---,K;j = 1,---,n,;. Then if N is the total 
number of accidents, it may easily be shown that >° >°0;:;/N has exactly the 
same distribution under the null hypothesis y = 0 as >> >>7;;/N in the body of 
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the paper. Hence a test of the hypothesis y = 0 in this case having desired size 
may be obtained just as described in this paper. However, >. >.0:;/N is now 
not sufficient and the test will not have the optimum properties of the tests of 
this paper. If the L,’s do not differ too much, it is probable that using the 
YS doi i/N will result in not too great a loss of power. 

Actually a sufficient test statistic for the case of varying times of exposure to 
accidents is the weighted mean >> >-L.o;;/N of the o;; (or, the grand mean of 
all the actual accident times, 7;;) and the distribution of this statistic under the 
null hypothesis is that of the mean of K independent sets of random variables, 
the i-th set consisting of n; independent identical variables uniform on (0, L,). 
This distribution (in the case of n; = 1,7 = 1, --- , K) has been studied recently 
by Olds [16]. 

Further discussion of the more general situation of varying times of exposure 
will be left for a later paper. In the following section we return again to the case 
of one period of observation for the accident times. 


5. Power function for the UMPU test of the hypothesis of no contagion in 
the set of admissible hypotheses implying no mixture-no time effect-linear 
contagion. Since we shall be interested in detecting the presence of contagion of 
either kind (positive or negative) we shall consider in this section the problem of 
computing the power function for the UMPU test of Section 4. 

Using (32) and (36), the exact power function P() for a given test size a, is 
given by 


i+en 
(51) po) = 1+ | pe(t| y) dl 


with 
(52) a/2 = [ Ny. (t| p = 0) de, 
0 


where the notation cy is used to emphasize the dependence of this value on N. 
It is obvious from the form of the density functions (32) and (36), that the 
computation of the exact power is a tedious procedure, even for N relatively 
small. Indeed, just the determination of the critical region boundaries by the 
numerical solution of the polynomial equation obtained from (32) is time-con- 
suming. Since the variable 7 with density function (32) is asymptotically normal 
(4, 1/-/12N), a first step in obtaining the approximate power consists in ap- 
proximating for a given level of significance a, the critical region boundaries. 


(53) 7 = 4 + Cy Cy = c/V/12N 


where 


(54) t (1/\/Ine* ”? dx =1—a. 
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Table I below gives a comparison of this approximation to cy with the true 
values of cy , for N = 3, 6, 9, 12 and a = .05. 


TABLE I 
Comparison of cy and erx/12N for N = 3, 6, 9, 12 and a = .05. 





. 22931 
. 18769 
- 16275 


Secondly, by the Central Limit Theorem, the variable = with density function 
(36) is asymptotically normal (u, «) with 


si HE Bh: genetic DESL 
RNG OND 


We obtain, then, as an approximation, say P)(y), to the power function, 


(56) Py) =1-— | 1 /9/dm o-2" de 
wy 


with 


(57) tou-e/VIQN tau te/Vi2N 


O, = 
a o 


Using the above approximation, Table II shows values of N needed for the 
power (as approximated) to exceed .90 for various values of y in the case of the 
UMPU test at the 5 per cent level of significance. The particular values of y 
used in this table were selected because of the significance of the factor e~” dis- 
cussed in section 2. Thus, the detection of a y such that e&* = 2 (or 4) would 
seem to imply a rather high order of contagion in that the oécurrence of each 
additional accident tends to double (or halve) the previous probability of avoid- 
ing accidents in a unit period. 


TABLE II 


Values of N required for power to be at least .90 for 5 per cent UMPU test, when 
e * has specified values of table. 


} 


e—¥ | 1.2 14 15 1.6 1.8 2.0 
..| $825 | 1120 | 770 575 370 | (265 


In view of the fact that N here is the total number of accidents, rather than the 
number of individuals, it would seem that the power of this test is fairly good. 
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ON THE RATIO OF VARIANCES IN THE MIXED INCOMPLETE 
BLOCK MODEL! 


By W. A. THompson, Jr. 


Virginia Polytechnic Institute and University of North Carolina 


Summary. The present article is an extension of Wald’s paper, [12] “A note 
on regression analysis.” Confidence interval and testing procedures on a ratio 
of variances are given for continuously more specific models with a corresponding 
increase in the preciseness of the results. Finally, Linked Block Designs and 
Duals of Partially Balanced Designs with two associative classes are discussed 
and it is found that here the analysis is quite simple indeed. (Ordinary Lattices 
belong to this last group.) 


1. Introduction. Though the experimenter has been using the variance com- 
ponents model for nearly as long as the fixed effects model, there has been 
relatively little success in developing a complete theory such as the least squares 
approach to the fixed effect case. Among the few theoretical papers on this 
subject is a series due to Abraham Wald which culminates in his 1947 paper 
[12]. There he outlines a method of placing a confidence interval on a ratio of 
variances. The actual application of this method in the nonorthogonal case will 
depend on the solution of an equation of the n-th degree where n would in 
general be large. The question naturally arises to what kind of designs can 
Wald’s method be applied in practice without unduly complicated calculations. 
One object of this paper is to answer this question so far as it relates to in- 
complete block designs. 

In Section 2 a set of sufficient statistics is derived for the variance com- 
ponents model with errors arising from only two sources. These statistics are 
then used in Section 3 to derive confidence intervals and tests of hypotheses 
concerning the ratio u of the two components of variance. In Section 4 these 
results are applied to incomplete designs. It is shown that if we have a design 
with linked blocks, i.e., any two blocks have the same number of treatments 
in common, then the formulae for finding the confidence interval take a very 
simple form. 

It appears that for carrying out tests simply or for assigning confidence in- 
tervals to the ratio of the per plot error to the block error one must balance 
the design with respect to the blocks, just as a balance with respect to the treat- 
ments enables one to estimate with ease, and carry out tests concerning ‘treat- 
ment. effects’. This line of thought has been pursued by investigating ‘partially 
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linked’ designs, for which the configuration of the blocks obeys the same re- 
strictions, as does the configuration of treatments for a ‘partially balanced’ de- 
sign. It has been shown that the matrix of the least square equations which 
would arise in estimating the block effects when regarded as fixed, has not 
more than m* distinct non-zero characteristic roots, if there are m* associate 
classes (with respect to the blocks). In the important practical case when m* = 
2, there are two distinct characteristic roots, which are roots of the quadratic 
e’ — He + A = 0 where H and A are the same constants which appear in the 
interblock analysis of the dual of the design. Since these constants have already 
been tabulated for all known 2 associative class designs [3], the calculation of 
the roots e’ and e” is very simple. The confidence interval and Wald’s test de- 
pend only on these roots, and the actual sums of squares involved can be simply 
calculated. Results regarding the number of distinct characteristic roots have 
been very recently obtained by Connor and Clatworthy [5] in an entirely dif- 
ferent connection (viz., the combinatorial properties of balanced incomplete 
block designs). It thus appears that designs which are completely or partially 
balanced with respect to treatments, and completely or partially linked with 
respect to blocks are of special importance. 


2. The least squares model with errors arising from two sources. 

2.1 Motivation. The purpose of this section is to derive a set of sufficient 
statistics for the ‘‘mixed” variance components model with errors arising from 
only two sources. We do this by treating our observations as random variables 
whose conditional distribution is the same as that assumed unconditionally in 
the ordinary least squares problem. This approach is that of Wald [12]. 

2.2 Notation. We shall be using vector and matrix notation throughout the 
rest of this paper. Small Roman or Greek letters will denote vectors or scalors 
while the capital Roman letters will be reserved for matrices. All vectors will 
be column vectors unless they are primed, in which case they will be row vec- 
tors. X = X(N X 6b + wu) will mean that the matrix X has N rows and b + 
u columns. 


A Ay . . *,: . 
A= by 4 will mean that the matrix A has been partitioned into the 
3> 4 


four matrices A; , Ay, A;, A, and that these last four matrices have the posi- 
tions indicated. 

2.3 A theorem in least squares. Let yi, y2,--- , Yw be independent random 
variables with a common variance o’ and let 


2.3.1 y’ = (Yr, Yes *** 5 YN): 


Suppose in addition that 


ren E(y) = XB = (Xi, X) be 
(2) 


where 


X= X(N X b+ u) = (X(N X wu), X(N X b)) = (X1, Xe) 


Se ae ey eg a eee ee 
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is a matrix of known constants and £6 is a column vector whose elements are 


the unknown parameters §:, B2,-°*, Bu, Buit, Buse, ***, Boru. We also 
introduce the notation 


2.3.3 A=A(b+uXb+u) = ae) 
25 3 


where 


A, = XiX1,  As=X2X1, As = XX. 


It is a well known theorem in least squares that if g* is a linear function of 
the observations, and if g* estimates a linear function of Bu41, Buss, - 
then g* must be a linear function of the elements of 


2.3.4 p = p(b X 1) = (Xz — Addi Xi)y. 


Note also E(p) = DB), where D is equal to A; — A,Aj'As, and from the 
formulae for the variances and covariances of linear functions we see that the 
variance-covariance matrix of the p’s is: 


2.3.5 (X2 — AsAz'X1)(X2 — AcAi'X1)’o* = Do’. 


We define g: = gi(u X 1) = Xiy. The elements of the vector g; generate a vec- 
tor space V, of linear functions. This space has dimensionality u since we have 
assumed that A,(= X;X;) has an inverse. The elements of p also generate a 
vector space of linear functions. We denote this space by V2 and its dimen- 
sionality by r, say. A short calculation shows that the bases of V; and V2 are 
orthogonal and hence the spaces are orthogonal. We now define the space V to 
be that orthogonal to V; and V;. The dimensionality of V then must be N — 
u — r. We may now choose an orthogonal basis for V, say Yi, --- , Yw—u— ; 
it is easily proved that E(Y;) = 0. We remark for future reference that vi 
is the sum of squares due to error. 
We record 


ci Bou 


E(g:) = Aiba + Ax8e , 
2.3.6 E(p) = DB , 

E(Y;) = 0. 
And the covariance matrix of the elements of Y* where 


ye - (91 ? p’, Yi, eh Yw—u-r) 


A; 0 O 
0 D Ole’. 
0 0 7] 


2.4 Variance components and conditional random variables. We will now change 
our assumptions somewhat from Section 3. Suppose now that y:,--- , yw are 
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independent and normal for given 8) with means E(y/B~)) = X68 and vari- 
ances o. In addition suppose that the elements of Bw) are independent and 
identically normal with means 0 and variances ¢*. We see from 2.3.6 and 2.3.7 
that 


E(g:/B@) = Aiba + A28e . 
2.4.1 E(p/B~) = DBe) 
E(Y;/Ba) = 0. 


And the covariance matrix of these conditional random variables is 


Ai 0 O 
2.4.2 0 D Ole. 
‘2 a 


We now state several lemmas, the last two of which were independently 
developed by Madow [9] and Skibinsky [10]. 


1) E E(x/z) E(x) 
2.4.3 2) Var (x) = E Var (x/z) + Var E(2/z) 
3) Cov (x, y) = E Cov (a, y/z) + Cov [E(2/z), E(y/z)}. 


Applying these lemmas we find the unconditional means to be 


2.4.4 E(q:) = ABw ; E(p) = 0, E(Y;) = 0. 
The unconditional covariance matrix of these same variables is 
A, 0 0 A:A, A:D 0 
2.4.5 0 D Ojo + ](A:D) D Ole 
. —: Z 0 0 0 


where D = (A; — A2Aj'A3). 

We may make an orthogonal transformation, p = Mz, in a manner entirely 
analogous to that of [11] Section 1 and find that 2, --- , z, and }\Y} are a set 
of joint sufficient statistics for the parameters 6,,--- , Bu, o and 7, 


3. The Ratio ¢°/o’. 

3.1 Motivation. In this section we continue assuming the variance components 
model discussed in Section 2. In that section we found a set of statistics to 
which we may restrict ourselves in inference problems concerning all the pa- 
rameters of the distribution. Using those statistics we will consider confidence 
interval and hypothesis testing problems involving the ratio ¢/o” = yu, say. 

3.2 The Wald confidence interval {12| for u and an associated test. We may 
verify that 


3.2.1 F = Fu) =" wifi, /&¥3 


2 
e: + ein 
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actually has the F distribution with r and n degrees of freedom. This together 
with the fact that F(u) is a monotone decreasing function enables Wald to find 
a confidence interval for u. We summarize his result as: 

TueoreM 1. Jf F,; and F: are chosen so that Pr(F; S F S F:) = 1 — a, and 
if u, ts the largest root in w of F(u) = F,, s = (1, 2) where F(z) is given by 3.2.1, 
then (u2, 1) ts a 1 — @ confidence interval for u if u2 is positive and (0, w1) is a 
1 — a@ confidence interval if 2 is negative. If wu, is negative then (0, 0) is a degen- 
erate 1 — a confidence interval. 

Of course, this confidence interval is not unique, as many possible choices of 
F, and F, are possible. For example, if F; = © then we = 0 and Wald’s pro- 
cedure gives an upper confidence limit; if 7; = 0 then 4, = « and we have 
a lower confidence bound for yz. 

We may also derive a test of Ho: S wo vs. Hi: > wo in this manner. 

TizoreM 2. If F(yuo) ts given by 3.2.1 and c is determined appropriately, if 
we are testing Ho:w S wo vs. Hy: np = po, and if we accept Hy when F(yo)< ¢ and 
otherwise reject Ho ; then the power function of this test is an increasing function 
of wu. 

The power function is as follows: 


ae 2.73) zi )| 
ao) ~ coms | eeo| - 3A) +E rap) era 


const. I. is exp [—4(D0f; + dogi)] df dg, 


where 


G(u) Gera /XSi, 


and g; = z:/(eo + ei)”; i -,r,and f; = Y;/o;j = 1,---,n. Thus 
G(u) is an increasing func aes oa ML. i if Gi(u) S Go(u) then c < G,(u) implies 
that c < G:(u) so that 


i dF < / dF 
G1 (u)>c Go(p)>e 


where dF = const. exp —1(Df3 + >9%) |} df dg. Therefore 8 is an increasing 
function of G and thus of yp. 

Thus if we choose c so that 8(uo) = @ then we have an a level test which has 
appropriate power properties. These appropriate power properties are, of course, 
that we are more likely to say u < yo the smaller yu is and we are less likely to 
say that u < yo the larger u is. B(uo) is the probability that a statistic with the 
F distribution exceeds a constant c. Hence c is chosen so that F,,,(c) = a 
where F,.,, is the cumulative F distribution with r degrees of freedom in the 
numerator and n in the denominator. Note that when up is 0 that 


Fw) =" D# / dys. 
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4. Some properties of incomplete block designs. 

4.1 Motivation. In 4.2 we will see what some of the formulae and theory of 2.3 
and 2.4 say when specialized to incomplete block designs. Then in 4.3 and 4.4 
we will consider the computation of 

zi 
é: + ei 
in some particularly interesting special cases. It will be remembered that this 
is the quantity on which the test and the confidence interval procedures of Sec- 
tion 3 are based. 

4.2 Application to General Incomplete Block Designs. We now consider 
y(t = 1,+-+, u;j = 1,--+, Bb) to be the “yield” from the 7 “treatment” 
and j* “block” of a statistical experiment using an incomplete block design. 
The reason for the quotes above is to remind the reader that these terms may 
apply to applications which are not at all agricultural in nature. We further 
assume that the y;;’s are independent and normally distributed random varia- 
bles for given block effects a,,---, a with means: E(y;;/a,,-+--, a) = 
ni;(t; + a@;) and variance o’. Here n,; is 1 or 0 according as the i treatment 
does or does not occur in the j** block. Thus 7; is the 7** treatment effect. Since 
only those y;; are considered for which n;; = 1, the total number of observa- 
tions y;; is N (i.e. > jn; = N). In addition the a’s are independent and identi- 
cally normally distributed with mean 0 and variance ¢°. Note that if the a’s 
were unknown parameters instead of random variables then we would have 
the general incomplete block model with fixed effects which appears in analysis 
of variance (see for example Bose [1)}). 


We may see that this model is a special case of the one described in 2.4 by 
setting 


> we onan 
0 008 tay neai vi 


LO Bi cee) gee O (Becca 
where if the 7** treatment does not occur in the j* block, that is n;; = 0, then 


it is understood that the corresponding row is missing from X. We must also 
let 8 be the column vector whose elements are 71, 72, °°: , Tu, @1, @2,°*° 


? 
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a. The r of Section 2 here becomes b — 1 and A; and A; become diagonal 
matrices with >>; ni; and >>,nj; respectively in the diagonal. It is easy to see 
that >> ;ni; = r;, the number of blocks in which the 7‘ treatment appears, 
and >> ;ni; = k;, the number of treatments in the j* block. Thus 
[ ry 0 5 
T2 | 


_ 


kp 3 


[my Ne ***) Ney | 
Nie Me °°°) hee | 
L Nib Nod eee Nudb 4 


and hence d;; , the general element of D = A; — A: Aj Az is given by 
4.2.5 d;; = bi; k, — >. mite 


where 6;; is Kroneker’s delta. 
Let 3; be the number of treatments occurring both in the 7** and j** blocks. 
Then if in particular r, = r2 = --- = r, = 7 (say) we have 
At 
=f, 


Remember from 2.3.4 that p = (X; — A:Aj'X1)y; and hence if we have an 
incomplete block design we find that 


4.2.6 di; = 6k, — 


9” vat ; i 1 mil. | tw Tu 
4.2.1 ke =" 
which «re known as adjusted block totals. Here B; is the total of the 7** block 
and 7’; is the total of the j* treatment. 


4.8 Linked Block Designs. An incomplete block design has been defined to be 
a linked block design if 


i=1,---,b, 
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(a) Each block has the same number of treatments k, 

(b) Each treatment occurs in r blocks, 

(c) Any two blocks have the same number of treatments A* in common. 
These designs were used by Youden [13], and are duals of the well-known bal- 
anced incomplete block designs. In this case our matrix D is of the form 


pRQl—1/r) —d*/r +s =r 7 
| —>}* c <a wes ane 
431 N /r ka 1/r) at 


L Nei yr ... 2 


where \* is the number of times two blocks contain the same treatment. |D — 
eI| = [1/r(k[r — 1] — A*) — e]”*-(—e), since k(r — 1) = (b — 1)d*. Thus the 
characteristic roots of D are 0 and e = 1/r(k[r — 1] — A*), the latter of mul- 
tiplicity b — 1. 

If now we make our orthogonal transformation of 2.4 from p’s to z’s, we 
find that }>zi/(e;, + eiu) = >ozi/(e + eu), since all the non zero character- 
istic roots are the same. However, since our transformation from p’s to 2’s was 
orthogonal, }>pi = >-zi , and 


a 2 
432 F(y) = Anal. Bus diet SU ‘ LP 
b-1 e+ ep DY; 
Now from 4.3.2, we know that yu, is the solution in y of 
N—-u-b+1 1 
b—1 





where s is either 1 or 2. More explicitly 


4.3.3 te = ( L s N ee b+ 1 LP: bi ‘) 
er. b-1 SY? ; 


Theorem 1 now supplies two-sided as well as one-sided confidence intervals 
for ». Theorem 2 supplies a test of the hypothesis Ho:u S wo vs. Hitu = wo; 
herer = b— landn = N—u-—-b+1. 

4.4 Partially Linked Designs. The dual of an incomplete block design is ob- 
tained by interchanging the rolls of the treatments and blocks. We now define 
a class of designs which are duals of the well known Partially Balanced Incom- 
plete Block Designs. In analogy with Linked Block Designs we define the 
Partially Linked Designs to be those which satisfy the following conditions: 

(i) The experimental material is divided into b blocks of k units each, dif- 
ferent treatments being applied to the units in the same block. 

(ii) There are u treatments, each of which occurs in r blocks. 

(iii) There can be established a relation of association between any two 
blocks satisfying the following requirements: 

a) Two blocks are either 1**, 284, --- , or m**® associates. 
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b) Each block has exactly n7 , i** associates (i = 1, 2, --- , m*). 


’ 


c) Given any two blocks which are i** associates, the number of blocks com- 
mon to the j** associates of the first, and the k associates of the second is pji 
and is independent of the pair of blocks with which we start. Also pit = pit . 

(iv) Two blocks which are i** associates contain exactly \} common treat- 
ments. Hence for a partially linked incomplete block design 


ik € _ ‘) fort =) 
r 
d;; _ 6; k; - > maa 


=| 
} 
\— 


8 Ts X 


As fori etd 
; ori # j 
where blocks i and j are s™" associates. 

The following identities hold between the parameters of partially linked 
designs: 


bk = ur, mtnt+-->- +n =b—1, 
miAt + nzd2 +--+: + Nmdme = k(r — 1), 


fori #j 
fori = j, 
* ix * j* * k* 
Ni Pier = NjPik = NePi; - 

The next theorem fellows easily from the work of Connor and Clatworthy 
(5). 
Turorem 3. If e is a non-zero characteristic root of D = (d;;), then e is a root 
of the following determinental equation 

k(1 — 1/r) + AT/r —e k(1 — 1/r) + AB /r —e 
1* 
P21 


* * Az Pit * 
- Ai — Az) kl = 1/r) + '- - ; (Ai — 2 


- 2 


We denote the two roots of this equation by e’ and e”. The method used by 
Connor and Clatworthy will also give the multiplicities p; , p. of e’, e” in terms 
of e’ and e”. We do not use the exact values of p,, p2 in this paper, but only 
the fact that they are positive integers which sum to b — 1. 

We may verify that if H* is the sum ~* the two characteristic roots and if 
A* is their product, then 


rH* = (2kr — 2k + At + AB) + (pit — piz)AT — dz) 
and 
rA* = (kr — k + AT)(kr — k + AP) 
+ (At — Az)[k(r — 1)(pit — pit) + AFpiz — Apis]. 


4.4.2 
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If the dual of the design we are working with is tabulated in Bose, Clat- 
worthy and Shrikhande [3] then H* and A* are the H and A tabulated there 
for this dual design. 

We now evaluate the required sums of squares. Remember from 2.4 that 
z = M’p where M is an orthogonal matrix such that M’DM is a diagonal matrix 
with the characteristic roots of D in the diagonal. Thus if we deal with a par- 
tially linked design with two associative classes, then according to the results 
of this paragraph we may choose M so that 


— 


a: “gS 
M'DM =| 0, ¢I,, 0 =|" 0| = 
ged yng , 


say, where the subscript on the identity matrix indicates its dimensionality. 
In this case we find that 


2 
2 
€i + ein 


is somewhat simplified. It is 


D1 > 
4.44 aphettcnas f l 
e+(e')*u ce’ + (e")*n 


where 


ya = ™ zi and = Zi. 
i=l impr tl 
Now suppose we consider the quadratic form 
4.4.5 m'(D — e'I)p 
where m is a solution of 
4.4.6 Dm = p. 
We make the substitutions p = Mz and n = M’m or m = Mn. 
Dm = Mz, DMn = Mz, M’'DMn = z, D*n = z. 
e’m; ; m; = z,/e’ P= 1,-++, pi 
23 = <(e"m; ; Mm; 


0 


Now using the above relationships it can be verified that 


” 


m'(D — e'l)p = —— >> 


é 
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so that >> = M’(D — e’l)p e”/(e” — e’) but m'(D — e'l)p = p'p — e'm'p 
and 


” 


e , , , 
4.4.7 >: = - (p'’p — e’m’p). 


Similarly, 


, 


e ’ 2 
hi = (p'p — e’m'p); 


e’ — e” 


m'p is called the block adjusted sum of squares and p’p is the sum of squares 
of the adjusted block totals. 

We may now calculate yu: and ye for partially linked designs. To do so we 
must solve the equations: 


4.48 F(u) = FP, 


44.9 ds i llion =a, 
re 7+ en 


e+ Pu 
where 
4.4.10 a, = F,(Yj(b — 1)/(N — b — u+ 1). 
Now 
Dale” + (e”)*u) + Doale’ + (e’)*u] 
= a,le’ + (e’)*ulle” + (e”)*ul, 
a,e'e” | - %4 - 5 + a,(e’ + "| M 
_21_ 2a 
a 


e”’ + a, — 0, 


4.4.11 


Hence a root c > must also satisy 4.4.12. Now we may see from 4.4.11 that 
up = —1/e” and w» —- —1/e’ can not be roots of 4.4.12 and hence we may re- 
verse the steps which brought us from 4.4.9 to 4.4.12 and find that a root of 
4.4.12 is a root of 4.4.9. 

Denote the left side of 4.4.9 by g(u) and graph this function. 
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| 
| 
| 
| 
| 


Since a, is non-negative we see that for all a, ~ 0, there are two values of yu 
that satisfy g(u) = a,. Only one of these values may be positive; hence if one 
of the two roots of equation 4.4.12 is positive, then u, is the larger of them. 
Now write 4.4.12 as follows: 

4.4.13 bu + cu +d, = 0, 
Here 
b, = a,e’e” = a,A* 
a,(e’ + e”) =. > 1” /e’ > > 2e'/e” 
= a,(e’ + e”) + p’p — m’ple’ + e”) 
(a, — m'p)H* + p'p 


di 2s 
e’ e” 


d, =a, — 
, v8 / hoe 
(p'p —e mp) (p'p — e'm’p) 
Ngee ee ee 
=a, — m’p. 


Hence 
4.4.16 i eS ve — 4b, dy 


since this is the larger of the two possible roots. 
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It should be reiterated that the confidence bound in Theorem 1 is given by 
4.4.16 only if u, is positive, otherwise a zero is substituted for yu, in the confi- 
dence interval. 

The test of Theorem 2 is then performed by accepting » S wo if 


_N—b—ut+1 1. m’pll + ule’ + e’)) — wop’p 
“i b-1 dy} (1 + e’yo)(1 + e”uo) 


is less than c and accepting u > uo otherwise. Here c is again chosen to be the 
value of an F variate with b — 1 and N — b — u + 1 degrees of freedom which 
has @ as its cumulative distribution ordinate. 
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A GENERALIZATION OF A PRELIMINARY TESTING PROCEDURE FOR 
POOLING DATA!:?:* 


By D. V. HuntsBerGER 


Iowa State College 


1. Summary. This paper is concerned with a generalization of the sometimes- 
pool procedure for pooling two estimators which is based on a preliminary test 
of significance. A weighted estimator for one of the parameters is obtained by 
using weights which are determined by the observed value of the preliminary 
test statistic. The efficiencies of the weighting procedure and of the sometimes- 
pool procedure are compared for the special case where the estimators are nor- 
mally distributed. Further, it is shown that the weighting procedure offers a 
greater degree of control over the disturbances which may result frem pooling 
than does the sometimes-pool procedure. Some problems concerning the choice 
of a weighting function are discussed. 


2. Introduction. The effects of preliminary tests of significance on subsequent 
statistical inferences have been studied in various special cases by Bancroft 
[1], [2], [3], Bechhofer [4], Mosteller [8], and Paull [9]. They found that the use 
of such tests introduces serious disturbances into the final inferences. These dis- 
turbances take the forms of biased estimates, losses of efficiency as regards esti- 
mation, or shifts in the sizes and powers of tests of hypotheses. 

Preliminary testing procedures may be characterized as follows: A statistic, 
T, is evaluated from the data at hand. If T is not significant at some preassigned 
level of significance, a given procedure is used to estimate the parameter in ques- 
tion or to test the major hypothesis. If T' is significant, an alternative procedure 
is used for obtaining estimates or for testing the hypothesis. In any event, the 
only information derived from T' is that it does or does not fall into the region of 
rejection. If more of the information contained in T is utilized, it is possible to 
exert more control over the disturbances inherent in the preliminary testing 
procedures than is possible by merely altering the level of significance of the 
preliminary test. 

Let X., --- , X, be a random sample with joint probability density function 


f(X1,°°+, Xn 5A, °°* , OH), 


where the functional form is known, 6; and 6, are unknown parameters, and the 
last k — 2 6’s are parameters whose values may or may not be known. Let 4; 
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and 6, be the best estimators of 6; and 6) as provided by statistical theory. If 
9, = 62, a pooled estimator g(6; , 6.) will, in general, provide an estimator for 6 
which is better in some sense than 6, . When it is not known whether or not 6; 
and 6 are equal, a better estimate for 6, may still be obtained by making use of 
any information provided by 6, . 

Let T be the statistic which the theory indicates will provide the best test of 
the hypothesis that 6, = 6, against the class of alternatives 6; ~ 6. . Evaluate T 
using the data at hand, and for an estimator of 6; use the function 


(2.1) W(T) = o(T)6, + [1 — o(T)]g(b: , 62), 
where ¢(7') is a function of 7 only. If ¢(7’) is defined as 


oT) =0, TCA, 
(7T)=1, TCR, 


(2.2) 


where A, and R, are the acceptance and rejection regions for the test of Ho 
with probability of type I error equal to a, then W(T’) reduces to the estimator, 
SP(T), following from the ‘“sometimes-pool” procedure based upon the pre- 
liminary test of significance. 

In order to determine whether or not W(T) offers any advantages over SP(T) 
or 6; as an estimator for 6;, the mean square deviation, D*, about the true 
parameter value is used as a criterion of goodness. 


3. Pooling normal estimators. Let 6, and 6, be two independent, unbiased, 
normally distributed estimators for 6, and 6 respectively. Let 6, and 6) have 
known variances, o; and o3 , respectively. A pooled estimator for @; is obtained 
in this special case as 

24 24 
20 6 
(3.1) W(T) = o(TM)b + 1 — o(7)) SATA 
oi + 93 
where ¢(T7’) is a function of T only and 

(3.2) T 


is normally distributed with mean 


, 0, — A _ 
(3.3) oo V0? + a3 


and variance one. Equation (3.1) may be written as 


o2 6; + oi bs + oi T(t) 
oi + 03 Voi + of 


(3.4) W(T) = 
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The mean square deviation of W(T7') about 6; is a function of the nuisance 
parameter y and is given by 
Dy(y) = E(W(T) — ay’, 
(35) Delo) = {ot + of [ (re(r) — a'N(r - art, 
oi + 03 | ~*~ J 

where 

NW) = ee" 

NY) = / 25 é 


The bias of W(T) as an estimator for 4, is 


(3.6) Bwly) = & — EIW(T)) = werals — [ To(T)N(T — vy) ar'| 
1 2 0 


4. Weighting functions. In order to use the estimator given by the weighting 
procedure (3.4), the weighting function ¢(7') must first be selected. The choice 
of #(7T') will be restricted to the class of single-valued functions of 7’ which are 
continuous except on a set of measure zero, which are defined for all 7, and 
which satisfy the conditions: 

(i) 0 s ¢(T) S 1, for all T, 

(ii) ¢(—T) = ¢(7). 

The class of functions so defined will be referred to as the admissible class of 
weighting functions. 

The choice of a weighting function should be based on some criterion by means 
of which the relative merits of various alternative functions may be assessed. 
A possible criterion is unbiasedness; that is, if a function ¢,(7') exists such that 
for all y 


E(W,(T)] = 6 


then ¢,(7T) is an unbiased weighting function. 

THEOREM 1. Among the class of admissible weighting functions the only unbiased 
weighting function is ¢(T) = 1. 

Proor. Because of the symmetry of ¢(7') the bias of W(7'), equation (3.6), 
may be put into the form 


Buly) = Fas |, Ti — s(n — v) — NT + yar. 
It is obvious that this is equal to zero when y is not equal to zero if and only if 
¢(T') is identically equal to one. 

A second desirable property would be uniformly minimum mean square error 
about 6, . If ¢,(T) is an admissible weighting function such that D},(y) = D%-(y) 
for every ¢’ and every y, with inequality holding for at least one y and one ¢’, 
then ¢,(7') is a uniformly minimum mean square error weighting function. It 
will be shown in Theorem 3 that such a function does not exist. 
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A third criterion which might be proposed is one which selects a weighting 
function which yields an estimator whose efficiency is greatest when averaged 
in some sense for all y. One such measure of overall efficiency is the area between 
the curves corresponding to Di(y) and to the variance, oj, of the “neverpool” 
estimator, 6; . The following theorem is obvious and is stated without proof. 

THEOREM 2. Among the class of admissible weighting functions, that one which 
maximizes the integral 


(4.1) = [ (ei — Delay 


is¢(T) = 1. 
As a consequence of Theorem 2 and the fact that if y is zero the minimum 


variance estimator is obtained by letting ¢(7') = 0 the following theorem may 
be stated. 


THEeoreM 3. Among the class of admissible weighting functions there exists no 
function ,(T) such that 


Di.(y) < Diy) 


for every ¢' and every +. 
Suppose that + is fixed and we consider an estimate 


36, + a1 6, 


oj + 03 


Aé; + (1 — A) 
where A can be a function of y. Its mean square deviation about 6; is 
o} [os 
ne A’ 1 pe 4 22 
aa (S+ + ( |, 


and this is minimized with respect to A when A = 7*/(1 + 7’). 
Since y is considered to be unknown and since 7 is an unbiased estimator for 
y, it was decided to estimate A by 


T 


(4.2) oo(T) = raP: 


5. Mean square and bias when ¢(7') = (7). If (7) = T?/(1 + T°?) is 
substituted for ¢(7’) in the expression (3.5) for the mean square deviation, 


2 o 2 o ses 7 P 
(5.1) Dw,(y) = aitale + oi [ E _ | N (7 _ ¥) ar. 


The integral in (5.1) can be put into the form 


(5.2) I(y) = 1+ 3H(y) — 5G(y), 
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» 


HG) = [ iz a ND - ») a7, 


wo 


(5.4) Gy) = [ 5 + a NT - var. 


A series solution for the integral H(y) was found by using a method given by 
Zemansky [10]. 


(5.5) H(y) = 
wo —22/2 
ia é7 | o.- dz, 
1 =" 
aa a ‘| ee dz, 
1 


l 


2n — 1 


(1 ro An i). 
Since 


G(y) = [1/2] 1 -¥ a - HG) |, 


a series solution for G(y) is obtained from that of H(y), (5.5). 


(5.6) Gy) = 1 + LL (Ana — Ao], 


n=l & 


and substitution into (5.1) gives 


4 —y72/2 
i . o\e 
(5.7) Dw.(y) = oi + at o% [3. lo — : 2 om — (44. 4) | 


A similar procedure gives the bias of W = when ¢(7') = @o(T7') as 
oy é i [2ny2n Ant 
— oh = Tatas, oa 


6. Equal variances. If oj = o: = o°, the estimator for 6, reduces to 


(6.1) wr) =4 5% 4 ee, 
where 

6, — b, 

Ak E 


Let o(7) = T°/(1 + 7”), then the mean square deviation of Wo(7’) is 


Sr ™ 
(62) Div) = 54246" a i see pe S a3 es 


T = 
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and the bias is 


Yo wp 7" 
(6.3) By,(y) = Va° 7 4 aq Ant: 


The “sometimes-pool” estimator for 6; is obtained if we let 


f 


_ jOfor T < t., 
a= 1 iol 2 t, 


where P(|t| = tz.) = a and ¢ is the standard normal deviate. Mosteller [8] gave 
the mean square of SP(T). 


a 


Dée(y) = (5) {2 + (ta + y)N(ta + vy) + (ta — y)N (ta — ¥) 
(6.4) 


+@°-) [vw an} 


The bias of SP(T7’) is obtained from the results reported by Bennett [5]. 


63) Bul) = 5 [7 [" NO ay +N +2) — NG v]. 


a 


The two estimators were compared on the basis of their efficiencies relative 
to 6, . These efficiencies are plotted as functions of || in Figure 1. Let y* be de- 
fined as the largest value of |y| such that for all |y| < y* the efficiency is greater 


(0) SP(T), ty= 1.6 
(b) Wo (T) 


EFFICIENCY RELATIVE TO © 


151 
Fria. 1. Efficiencies of SP(T) and W,(T7’) 
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than one. y* will be referred to as the effective difference of the estimator. The 
curves of Figure 1 indicate that: (1) the maximum possible loss of efficiency is 
smaller for Wo(7') than for SP(T); (2) the effective difference is greater for 
Wo(T); and (3) for larger values of |y|, SP(7') is more efficient than W (7). 


7. Two-parameter weighting functions. In order to study the effects of chang- 
ing the shape of the weighting function curve, a two-parameter family of weight- 
ing functions is defined as 


(7.1) ¢(T; a,b) = 1 — ae’, 


where the parameters have ranges 
(i)0 Sal, 


(ii) b= 0. 
The mean square of W.»(T) is 


9 


D3(y) = == f ly — Tl — ae”)? N(T — y) dT. 


Integration yields 


4 ( 2 2 
2 aa RS: SE... _ —2by2/ (4b+1 
Daly) = 01 + a7 ot \G@b + 1) wm} bt a 


Ni ___ 2a Pe os 2by’ oreo 
2b + 1)32 2b + 1 Fs 


9 


- ° 2 2 2 
For the special case where o; = o2 = o 


2 = o hs ae 2a oe _2by by? (2b+1) 
Daly) = 5 2 (2b + 1)? [ 2b + 14° 


aie — _* , 2b y?/(4b+1) \ 
+ G+ ya [1+ a | fr 


and the efficiency of W.»(T’) relative to 6; is o°/Di(y). This was evaluated for 
various values of a and b. 

In Figure 2 the efficiency curves are plotted as functions of |y| for a = 1.00 
and b = 1.00, .50, .25, .10. When a is fixed, decreasing b has the following effects: 
(1) for |y| = O the efficiency increases; (2) the maximum possible loss of effi- 
ciency is increased; and (3) the range of |y| for which large losses of efficiency 
may be sustained is increased without a corresponding increase in the effective 
difference. 

The curves of Figure 3 are the efficiencies of W(7) for b = .10 and a = 1.00, 
.65, 42. They reveal that as a decreases, b fixed: (1) the efficiency at |}y| = 0 de- 
creases; (2) the maximum possible loss decreases; and (3) the effective difference 
increases. 


(7.2) 


(7.3) 


8. Comparison of W,(7’), Wa(7'), and SP(T). The relative efficiencies of these 
three estimators for 6, in the case of equal variances are plotted as functions of 
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(1) b=1.00 
(2) b= 50 
(3) b= .25 
(4) b= .10 


. : 


nv 


EFFICIENCY RELATIVE TO 6, 


151 
Fig. 2. Efficiency of W.:(7) for a = 1.00 


(1) a =.42 
(2) 0 =.65 
(3)0 =100 


° nN > ( 


EFFICIENCY RELATIVE TO 0, 


(Si 
Fia. 3. Efficiency of W..(7) for b = .10 
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(0) SP(T), teay= 16 
(b) W(T) 
(c) W,(T),0=.42, b=.1 


EFFICIENCY RELATIVE TO 6; 
Bie Se 


Fia. 4. Efficiencies of SP(T), Wo(7’), and Was(T) 


\y| in Figure 4. The constants involved were selected so that the efficiencies of 
all three would be very nearly equal for |y| = 0. The following facts are apparent: 
(1) Wa(T) provides the greatest effective difference, SP(T) the smallest; (2) 
the maximum loss is least for Wo(7'), greatest for SP(T); and (3) the range of 
ly| for which large losses occur is shortest for SP(T). 

To compare these estimators on the basis of overall efficiency as defined in 
Section 4 the integral 


2 20 
J=-< [ T{1 — ¢(T)\ aT 


was evaluated for each of the weighting functions plotted in Figure 4 with the 
following results, the largest value corresponding to the greatest overall effi- 
ciency; Jo = — .7850°, Ja = —.8750°, Jsp = —1.3650°. Of the three, W.(T) 
has the greatest overall efficiency. 


9. Discussion. The results of Sections 6, 7, and 8 indicate that in the case of 
normal estimators the generalized pooling procedure is effective in reducing the 
maximum loss of efficiency and increasing the effective difference. 

Since it was shown in Section 4 that there is no uniformly minimum mean 
square error weighting function and no unbiased weighting function, the choice 
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of a function, ¢(7'), might be based on one of the following criteria or on a com- 
bination of them: 

(1) Select a weighting function which will provide a small maximum loss. 

(2) Choose ¢(7') so as to have a large overall efficiency. 

(3) Select (7) to give a large effective difference. 

(4) Select ¢(7') so that the gain in efficiency is large when y = 0. 

It is readily apparent that these criteria are not independent and that mini- 
mizing the maximum loss or maximizing with respect to any one of the last 
three will, in general, lead to the never-pool estimator or will have adverse 
effects on the other characteristics of the estimator. Of the functions which were 
studied it appears that ¢o(7’) is the best compromise when nothing is known 
concerning the size of the nuisance parameter y. Any prior knowledge concern- 
ing y might conceivably be used as an aid in selecting one of several possible 
alternative functions. 

It is realized that only a beginning has been made on the applications and 
effects of the generalized pooling procedure and that the problems which were 
considered in this investigation belong to the simplest class of problems to which 
the procedure might be applied. The author feels, however, that the results 
which have been achieved here indicate that the procedure should be effective 
in controlling some of the disturbances which arise in other more complex ap- 
plications of preliminary tests of significance. He feels that the advantages 
claimed for the weighting procedure in this study warrant further investigations 
along two lines: (1) An investigation should be made into the operating charac- 
teristics of the procedure when used in the other problems for which the effects 
of a preliminary test have been studied; and (2) a more rigorous examination of 
possible weighting functions and rules for their selection should be considered. 
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ON THE SELECTION OF n PRIMARY SAMPLING UNITS FROM A 
STRATUM STRUCTURE (n = 2) 


A. R. Sen! 
Economics and Statistics Department, Uttar Pradesh, Lucknow, India 


1. Introduction and summary. Hansen and Hurwitz [1] showed for two-stage 
sampling that the selection of a single primary sampling unit (p.s.u.) with 
probability proportional to some measure of its size (p.p.s.) from each stratum 
is generally more efficient than selection with equal probability. Midzuno [5] 
generalised the Hansen and Hurwitz approach to sampling a combination of n 
elements from each stratum. Neither Hansen and Hurwitz nor Midzuno pro- 
vided a method for estimating the between component of the total error from 
the sample. Recently Horvitz and Thomson [3] have also given a method for 
dealing with sampling without replacement when arbitrary probabilities of se- 
lection are used for elements remaining prior to each draw. Methods for obtain- 
ing an unbiased estimate of the population total as well as of the variance of the 
estimate are presented. The scheme, however, suffers from certain practical dis- 
advantages. One such disadvantage is the difficulty involved in the determination 
of the selection probabilities. Another disadvantage is that Horvitz and Thom- 
son’s unbiased estimate of the variance has generally no practical application as 
it may assume negative values. This has been shown independently by the present 
author [9], [10] and Yates and Grundy [11]. The authors also derived an expres- 
sion of the unbiased estimate which is free from this defect. 

Working independently, the present author [7], [8] developed the theory when 
a combination of n p.s.u.’s are sampled from a stratum and applied it to the 
case when n = 2. In this paper an outline is given of the general theory of the 
selection and estimation procedure for obtaining unbiased estimates of the 
between component of total error where first r p.s.u.’s are selected with p.p.s. 
and the remaining n — r are selected with equal probability, the selection being 
without replacement. An expression for the estimate of the variance of the 
estimated total is presented. It is shown that the unbiased estimate of the vari- 
ance of the estimate is generally inefficient and may assume negative values for 
certain combinations of the sample values except for the special case when the 
measures of sizes are all equal. A biased estimate has, however, been derived 
which is always positive and is more efficient than the unbiased estimate. It is 
shown that for the particular case when r = 1 the unbiased estimate of the total 
reduces to a simple form which is useful in practice. It is proved that the selection 
of one p.s.u. with p.p.s. and the remaining n — 1 with equal probability is equiva- 
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lent to selecting a combination of n p.s.u.’s, where the measure of size is the 
sum of the measures of the combination. 


2. Probability functions for selection with probability proportionate to size. 
Consider a population U,, --- , Uy of units with respective measures of sizes 
proportional to X,, --- , Xy . Inparticular, the X’s may be previous census values 
of the characteristic for the units known exactly. Let (i, , --- , t,) be any arbi- 
trary combination of size n taken from 1, --- , N. Obviously the total number of 
possible combinations is (*). 

Let Q(t, --- , t) be the probability of selecting the units U;, , --- , Ui, from 
among the units U,, --- , Uw with p.p.s. to X;,, --- , X;:, the selection being 
without replacement, each element being selected p.p.s. from those remaining 
after the preceding selection. 

THEOREM 1. Q(i,, --- , t-) is given by the recursive formula 


(1) Ql, ++: ,%) = a Xj Qi tr, -+* , th-1, tiga, °° * 5 te) 
i 

where X = > > X; , and Q;; indicates that U;; is eliminated as a possible selection. 

Proor. The right hand side of (1) within the summation sign is the probability 
of selecting the r units U;,,--- , U;, out of U;,,---, Uiy such that the i; 
unit U;, is firstselected with p.p.s. to X;, and the remaining r — lL units U;,, --- , 
U;,-,,Ui;,,,°°* , Ui, are next selected from the remaining N — 1 units U;,, --- , 
U;,_,, Viz,,,°°* » Vig with pps. toXi,,°:: ,Xi;,,X 


“~ *g419 


- , X;, and with- 


ry *y"4° - th . r 
out replacement. The sum of all such probabilities where the 7; unit U;, may 
be any one of the units U;,, --- , Ui;, --- , Ui, is equal to Qi, «+> , 7,) 


r/* 


In particular, 


ae foal. 1 
Qi, #) == iv (5 hk ee -) 
a) 4 ~~ dhio 


THEOREM 2. The probability of selecting a specified n units such that any r units 
U;,, +++, U;, are first selected with p.p.s. to X;, , «++ , X;, and the remaining n — r 


units with equal probability from among the remaining N — r units, the selection 
being without replacement, is given by 


(2) P(n, r, i]) = ~, aaa Dd Wi, +++, %) 
(N — r3;n — 1) Gor) 

where (N — r;n — r) ts written for te and > (n:r) denotes summation over all 
possible combinations (i, , ++: , t,) out of (1, +++, t, +++, t). For simplicity 
write P(n, r) for P(n, r, \t]). The proof is omitted. 

Special Cases. 

The following are some important special cases: 

r=2,n= nN. 


r (mn, 2) = zz: | aXe ( l 


(m;2) X X - Xi, 
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Thus P(n, 2) is the probability of selecting n p.s.u.’s such that the first two 

units U;, , U;, are selected with p.p.s. to X;, , X;, and the remaining n — 2 units 

with equal probability, the selection being made without replacement. 
r=I1,n=n. 


(4) P(n, 1) = 





Liaise Dt 


“) 


From (4) it follows that P(n, 1) is the probability of selecting a combination of 
n units U;,, +--+, U;, with probability proportional to the total measure of size 
of the combination. Also from (5) P(n, 1) is the probability of selecting n p.s.u.’s 
U;,,-::, U;, such that the first unit is selected with p.p.s. to measure of size 
and the remaining n — 1 units with equal probability but without replacement. 
Hence the selection of one p.s.u. with p.p.s. and the remaining n — 1 units with 
equal probability and without replacement is equivalent to selecting a combina- 
tion of n p.s.u.’s with p.p.s. to measure of size of the combination. 


3. Unbiased systems. Let 7’, be any function of the observations on n elements 
selected by some probability system from a population consisting of NV elements. 
In particular, let 7, be an estimate of the population value Y with regard to the 
probability function P(n, r). Then [P(n, r), 7] will be defined as a sampling 
system. 

A sampling system [P(n, r), 7',| is said to be unbiased [4], [8] if 


(6) Epc (Tp) == z 


We will now consider the class of unbiased estimates where the value of the 
auxiliary variate XY correlated with that of the characteristic Y is known before- 
hand from a complete census. In particular, Y may be the value of the charac- 
teristic Y at a previous census, not subject to any sampling error. For simplicity 
of notation we will consider only one stratum. 

Consider now a population total Y as the total of the population of different 
units. Also let Y; be the population total for the i‘ unit. Consider a two-stage 
sampling system in which n p.s.u.’s are selected out of N units from the stratum 
such that the first r units are selected with p.p.s. and the remaining n — r units 
with equal probability and without replacement. Let the selected n units be next 
independently subsampled at random without replac ement and the unbiased 
estimated totals based on the subsamples be y;,, «++ , >, - 
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THEOREM 3. 


(7) > Yi, 


j=1 - 
P(n, r), (N — l;n -_ 1)P(n, r) 


is an unbiased sampling system. 
Proor. By Theorem 14 ch. 3 of [2] on conditional expectation, 


Dt Yi; ' os 1 1 a ee 
~ (N=-1;2-) . | Eon = > vi, | 
(N — 1;n — 1)P(n, r) P 
where E,()- Yi) is the conditional expectation of oP, holding the first 

stage units constant. By theorem 6 ch. 3 of [2], 


E(u) =D Bi, = DY. 
j=1 j=1 j=1 
Hence 


(8) E fo He - 2, E ~ : 


(N — 1;n — 1)P(n, r) 


3y Theorem 15, ch. 3 of [2] on conditional variance, 


Var 2» Yi; = Kf al > Vis = is) \ 


j=l j=1 f 


(N — 1;n — 1)P(n,r) M (N — 1;n — 1)P(n,r) )} 


i (Y;,+---+Y¥;,) : ¥] 
+ 8 [{ Gat + a 7 


where Edo j= ss — Y,,)/(N — 132 = 1)P, r)}* is the conditional variance 
of > falyi;) /(N — 1; — 1)P(n, r) holding the first stage units constant. 
Hence 

De vi; 

j=1 


“a — 1)P(n, r) 


(Zi, +--+ + Zi) 
n= Ip <. ae _P(n, 7) 


Ww ithin vari variance 


saw EP ese Se P(n,r) 


Between variance 
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2 
where Z; = M,(M; — m,) — , m,; is the number of ae units sampled 


from M; units and o; is the within variance of the 7* * p.s.u. sampled. 
“pune saga 


r=I,n=n. 


> vi, ] 


j=l 


(10) oe Oe 1), — a 


x, 


j=1 
lex. | 
j=1 
1 (Ys, + - 


" (N _ ee! — 1) fa Cows ty (Xe +: 


1 
(N — 1;n — 1) 


Var 


os 


r . 
=>) Jere > (Xi, + X;;) (yi, + Te) | 
(12) System:| (N a 1)X ’ (Xs, + Xt.) X . 


and 


Var | BF ¥) +1 +e x| ib > : s (Y;, Ya + Y;,)° ee y° 


(Xi, +X n<in (Xi, + Xiz)(N — 1) 


(13) e zZ..) 
Zi, + X 
+2 Rat 
i; < te — 1)(X:, + Xi,)" 

4. Two other cases. 

Case 1. r = 0,n = n. In this case the n units are selected with equal probability 
and without replacement. 
1 
14 P(n, 0) = =—.. 
The unbiased system and its variance are given by substituting P(n, 0) for 
‘-P(n, r) in (7) and (9) respectively. 

Case 2. r = n, n = n. In this case the n units are selected with p.p.s. and 
without replacement. Substituting n for r in (2) 


(15) P(n, n) = Qi, «++ , ts) 
The unbiased system and its variance are given by substituting P(n, n) from 


(15) for P(n, r) in (7) and (9) respectively. A practical case of interest is when 
n = 2. The unbiased system is 


, WO « (yi, + Vis) 
(16) [ (2, 2), (N - as |: -n — 1)P(, 2) 
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where 





o Xi, Xi 1 1 
712.2) = QU 4 = ihcamecan steal + atin 
I ( ’ ) Mi, ie) - lx pa Xi, Y of x: | 


The variance of (16) is obtained by substituting the value of P(2, 2) for P(n, r) 
in (9). 


5. Unbiased estimate of the between p.s.u. component of variance. 


THEOREM 4. An unbiased estimate of the between p.s.u. component of variance 
in (9) is given by 


(17) G. = Q. — LD (ai;-Zi,) 
j=1 


where 


(18) Qn ” Xu Ai; Yi; +2 z >» Ci 5-te Yi Vix 
I= ij< te 


1 1 
(N—1;n—1?P2(n,r) (N — 1;n — 1)P(n,r) 
coi L ie (N52) 
site (N—1;n—1)?P?(n,r)  (n; 2)(N; n)P(n, r) 
and Zi, is an unbiased estimate of M;(M;, — mi ,)oi, /m 


PROOF. 


E\G,,| = E[Q.) -— E | (0,2) 
j= 


ap. any | Pon r) (= ai,Yi, +220 Dei: r.-Y.)]. 
j=1 ij < iy 


(20) 


ty <e++<ty 


\lso if Q, - Z j-1(a;,- Z;,;) is an unbiased estimate of the between p.s.u. com- 
ponent of variance in (9) 


a - Sa-%) |= 7 | ‘ >. yy Yat -+: + Fal 
j=1 d —_ 


l;n — 1) Te...<F P(n, r) 


* |= > (Yi,+ 29 be Y;,) 


n<:--<i (N—1;n— 1) 


i eons Mee ee 109) 
9 1 2 i 
“ bee (n; 2)(N; n) 7 


Hence by comparing coefficients of the terms of the type Yi,, Y;, - Y;, ete., 
in (20) and (21) we have a,, , ¢;,.;, a8 in (19). For unistage sampling, an unbiased 


estimate of the variance of the estimate is given by 


Ya ¥2, + 2D Dewaghp Ya. 
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This follows as a special case of (17) when the within component of variance is 
zero. 

Further, the necessary and sufficient condition that the quadratic form Q) 
is nonnegative is that all principal minors of the quadratic form are nonnegative. 
Considering only the first and second order principal minors we have as a neces- 
sary condition for the existence of an unbiased estimate of the between p.s.u. 
component of variance 


na ae < P(n, r) 


9 searlincaas es 
(23) (VN — I;n — 1) 


and 


‘ 2(n — 1) 
(24) Wa SW iat © PO”. 
Condition (24) implies (23) but is not generally true. In fact, if n = 2, the in- 
equality (24) reduces to 2 / N(N — 1) 2 P(2, r) which holds only when all the 
elements are selected with equal probability but without replacement. 

Special Cases. 

r = 1,n = n. An unbiased estimate of the between p.s.u. component of vari- 
ance is given by (17) where 


7 X —_ ; 


DXi, 
j=1 


a;. 
7 


x \? ( 
7. 


7 


j=1 


r = 1,n = 2. An unbiased estimate of the between p.s.u. component of vari- 
ance is given by (17) where 
, X 


ae, a 


j=1 j=l 


nek X 
—(N — 1)[ 73 


> Xi, Pdi, 
j=1 


j=l 


a,, = a, = 


r = 2,n = 2. An unbiased estimate of the between p.s.u. component of vari- 
ance is given by (17) where 
a 1 ae 1 
‘2  (N — 1)?P*(2,2) (N — 1)P(2, 2) 
1 sons ald 
(N — 1)*P?(2,2) P(2,2)° 


a 
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6. A biased estimate of the between p.s.u. component of variance. 

THEOREM 5. A biased estimate of the between p.s.u. component of variance which 
is more efficient than the unbiased estimate (17) is given by 
(28) (i) Zero where (17) is negative; 


(ii) (17 ) where (17 ) 7s positive and where a; ’s and Ci +4 ’s are given by (19). 
i j°tk Y 
PROoF. 


Let Var ee EE =E | a. _ jos (as,-2%) | = B where 6 2 0. 
(N — 1;n — 1)P(n,r) -_ 


Denote the estimate (28) by R’,. Then R‘, = G', for the set of points for which 


G’, = Oand R}, = 0 for the set of points for which G’, < 0, i.e., G, = —H’%, (say) 
and 


, 


EIR, — Bf = SG. — 8)P(n,r)] + PC, r) 
i 2 


E|G, — By = » [(G, — By P(n, r)] + 2X (Hn + 8)'P(n, r) 


. . . y/ 
where >>; denotes summation for nonnegative values of G), and }°» denotes sum- 
. . . e sy! r , 2 ’ 2. y/ 
mation for negative values of G,. Then E[R, — 8B) < E[G, — 6] if G, assumes 
negative values with positive probability. Hence (28) is more efficient than (17). 
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BALANCED INCOMPLETE BLOCK DESIGNS AND 
TACTICAL CONFIGURATIONS 


D. A. Sprott 


University of Toronto 


1. Summary and Introduction. A balanced incomplete block design (BIB 
design) is an arrangement of v varieties or treatments in b blocks of k distinct 
varieties each, so that each variety is contained in r blocks and every pair of 
varieties is contained in \ blocks. Various methods of constructing such de- 
signs are discussed in [2], and certain designs are listed in [3], [4], [5], [7], [14]. 
If v = b, the design is said to be symmetric; the impossibility of certain sym- 
metric designs was proved in [10]. 

Although in [8] certain tactical configurations are discussed, it seems that the 
relationship between BIB designs and tactical configurations, and in particular, 
the Steiner system, has been overlooked. It is the purpose of this note to point 
out this relationship and to discuss the properties of designs arising from such 
configurations. 





2. Tactical configurations. A complete a-8-k-v configuration is an arrangement 
of v elements in blocks of k so that each set of 8 elements occurs in exactly a 
blocks. A Steiner system is a complete 1-8-k-v configuration, that is, v elements 
arranged in blocks of k so that each set of 8 elements occurs exactly once. Various 
systems of this kind are discussed in [6], [9], [12], [13]. We shall use the notation 
S(8, k, v) to denote a Steiner system; thus S(2, 3, v) is a triple system, 
S(2,p" + 1,1 + p" + p’") isa finite two-dimensional projective geometry, and 
S(2, p", p’") is a finite two-dimensional Euclidean geometry. 

A list of some of the properties of Steiner systems is: 

(1) The existence of S(8, k, v) implies the existence of S(8 — r, k — r, v — r) for 
r < 8, [12], [13]. (Similarly, the existence of the a-8-k-v configuration implies the 
existence of the a-(8 — r)-(k — r)-(v — r) configuration.) 

(2) The existence of S(2, p", v1) and S(2, p", ve) implies the existence 
of S(2, p”, vws) [12], [13]. 

(3) The existence of S(2, p" + 1, v) implies the existence of S(2, p” + 1, 
p'v + 1) [12], [13]. 

(4) The existence of S(3, 4, v) implies the existence of S(3, 4, 2v) [12], [13]. 

(5) S(3, p" + 1, p” + 1) exists if p is prime [12], [13]. 

(6) S(2, 2’, 4(4” — 2’)) exists [12], [13]. 

(7) The triple system S(2, 3, v) exists for v = 6m + 1 or 6m + 38, [2], [6], [9]. 

(8) The 2-3-4-(p + 1) configuration exists if p is a prime of the form 6m + 1. 
If m is odd, say m = 2n + 1, the system subdivides into two 1-3-4-(6n + 4) 
configurations [6]. 


Received November 22, 1954, revised April 14, 1955. 
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3. The a-8-k-v configuration and BIB designs. 
THEOREM 3.1. The a-8-k-v configuration is a BIB design with parameters 


v, b= als)/G), t= a-1)/(G1), |, 
h = a(s-2)/(s-2) if B23, A=a if B=2. 


Proor. Each set of 8 elements determines a and only a blocks. There are 
a(3) such sets, each block giving (3) of them. Hence the number of blocks is 


b = a(s)/(8). 


The number of replications is the block number for the a-(6 — 1)-(k — 1)-(v — 1) 
configuration, and so 


r = a(s-1)/($-1). 


Similarly, \ is the block number for the a-(8 — 2)-(k — 2)-(v — 2) configuration, 
and so 


h = a(g-2)/($-2). 


If 8 = 2, every pair of elements occurs a times by definition. Finally, every block 
contains k different elements. 
Coro.uary. Every set of u elements occurs in i, blocks where 


hu = a(pou)/(gou)- 


Proor. This is just the block number for the a-(8 — u)-(k — u)-(v — u) config- 
uration. 


4. Series of BIB designs formed from tactical configurations. 

THEoreM 4.1. If a BIB design with parameters v, b, r, k, X = do has the additional 
property that every set of u elements occurs exactly ,, times, then the design splits into 
two smaller designs with parameters 


(1) ve =v-1, Web—-r r=r—ras, KM=k N=AU-D, 


(2) v =o=— Il, b” = r, r” = Xo k"=k-—- ‘ NV = Nia - 


-?) 


Proor. Consider the blocks remaining when, from the original design, all 
blocks containing a given element are deleted. In the original design, every set 
of 7 elements occurred \;,; times with the given element, and \; times in all. 
Since only blocks containing the given element were deleted, we get 


NG = Ae — Deas, r =r—de, bo =b—r. 


This is the first design of the theorem. If the given element is deleted from all 
blocks which contain it, design (2) is obtained with v” = v — 1,b” = 1r,k” = 

kk — 1. \¢ is the number of times that every set of 7 elements occurs, and this is 
equal to the number of times that every set of 7 elements occurred with the given 
element in the original design, that is to \;41. Obviously, r” = Az 





754 D. A. SPROTT 


Coro uary. Any a-8-k-v configuration with 8 > 2 splits in the manner described 
in Theorem 4.1. 

Proor. Theorem 3.1 and the definition of the complete a-6-/-v configuration 
ensure that the required conditions are satisfied. 

Example 4.1. The 2-3-4-(p + 1) configuration exists if p is a prime of the 
form 6m + 1. Hence we obtain from it the design 


v=pt+l1, b=pp'— 1/12, r= p(p — 1)/3, 


If m = 1 we have 
y = 14, b = 182, y = §2. k = 4, Ae = 12, A3 = 
which splits into the two designs (13, 130, 40, 4, 10) and (13, 52, 12, 3, 2). 
Example 4.2. A great many Steiner systems with 8 = 3 are known; however, 
only four are known with 8 > 3, namely, S(5, 6, 12), S(5, 8, 24), together with 
their derived systems S(4, 5, 11) and S(4, 7, 23). The method of constructing 
these systems is outlined in [6]. Thus the design S(5, 6, 12), with parameters 


v=12, b=182 r=66, k=6, 
Ag = 30, Az = 12, Ay = 4, As l, 


splits to form the following series of designs: 


S(4,5,11) v=11, b= 66, r= 30, kK=5, = 12, 3;=4 N= 1 
S(3, 4,10) v= 10, 6=30, r=12, k=4, »»+=4 d= 1, 
S(2,3,9) v=9, b=12, r=4, k=3, »x»=1, 

eo@wil, b= 6, r=, £=6, A= 18, A =4, w= 3 

v= 10, '6= 36, r=18, k=5, +» =8 rA3=¢% 

v= 10, 6= 30, r= 18, k=6, ».= 10, A 5 

v=9, b=8, r=8 k=4 »=3 

v=9, b=18, r=10, K=5, %4.=5 

v=9 b=12, r=8 k=6, »»w=5 


A similar series of nine designs arises from S(5, 8, 24). 


5. The construction of configurations from BIB designs. Using the BIB 
designs. 

(1) (2k, 2r, r, k, X) or 

(2) (2k — 1, b, 2d, k, d), 
it is possible to construct a complete configuration consisting of blocks B; of 
either (1) or (2) together with blocks B; ; if (1) is used, blocks B; are the com- 
plements of blocks B, , i.e., they contain the elements not contained in blocks 
B; ; if (2) is used, blocks B; are the complements of blocks B; , and in addition 
contain the element «. It is obvious that in either case the resulting configura- 
tion has v = 2k elements and all blocks contain k distinct elements. 

TueoreM 5.1. If in the previous construction series (1) is used, a com- 
plete (3 — r)-3-k-2k configuration is obtained; if series (2) is used, a complete 
(b — 3X)-3-k-2k configuration is obtained. 
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Proor. Let B; be the blocks of a general design (v, b, r, k, \); then B; are the 
blocks of a design (v, b, b — r, v — k, b — 2r + XA). Suppose that a specified 
triplet x, y, z, occurs in exactly c blocks B; ; then exactly two of 2, y, z, occur 
together in \ — c blocks B; , and exactly one of x, y, z, occursinr — 2(A — c)—c¢ 
= r — 2\ + c blocks B;. Hence the total number of blocks B, containing 2, y, 
or zis 3(r — 2A +c) + 3(A — c) + ¢ = 3r — 3A +4 ¢, and the number of blocks 
B; not containing z, y, or z is b — 3r + 3A — ec, which is equal to the number of 
blocks B; containing x, y, and z. Therefore any triplet x, y, z, occurs in b — 3r + 
3\ blocks B; and B; . In series(1), b — 3r + 3A = 3d — r. In series (2), b — 3r + 
3\ = b — 3d. Also, © occurs in all blocks B; , and hence occurs b — 
2r + \ = b — 3d times with every pair of elements. Therefore all triplets occur 
b — 3d times. 

Example 5.1. In series (2) choose k = 2d, that is, (2) is the series (44 — 1, 
4. — 1, 2A, 2A, A), which exists if 44 — 1 is a prime power (Theorem 4.1, [11}). 
The theorem allows us to form the complete (A — 1)-3-2\-4\ configuration. 
This is a series of BIB designs with parameters 


v 4, b = 2(44 — 1), r= 4 — 1, 


- 


= 2-1, 


This series of designs is affine resolvable (see section 7). 

Example 5.2. In series (2) choose k = i, that is, (2) is the series 
(2\ — 1,4 — 2, 2d, A, A) which exists if (24 — 1) is a prime power. The theorem 
gives us the complete (A — 2)-3-A-2\ configuration, which is the resolvable series 
of BIB designs 


2, b=42\-—1), r= 2(2—1), 
pa ee as RL STS 


Some special cases of the preceding series are constructed in [15], where possible 
applications of such designs are also considered. 


6. Symmetrical BIB designs derived from Steiner systems. In the next two 
sections the symmetry and resolvability of designs arising from Steiner systems 
will be discussed. 

Tueorem 6.1. If S(2, k, v) exists, then a necessary and sufficient condition for 
it to be a symmetrical design is 


v=k—k+1. 


The proof of this theorem is obvious. 

THEOREM 6.2. Except for the trivial case S(3, 3, 4), S(3, k, v) is not a symmetrical 
de sign. 

Proor. The existence of S(3, k, v) implies the existence of S(2,k — 1,v — 1); 
applying the necessary condition b 2 v to the latter design, we get 


v— 22 (k — 1)(k — 2). 
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Also, for S(3, k, v) to be symmetric we must have 
b = v = v(v — 1)(v — 2)/k(k — 1)(k — 2). 
This can be written as (v — 2)? + (v — 2) — k(k — 1)(k — 2) = O. Hence 
v—2= R-14+ V1 + Sk — DK — 2)} & (k — 1)(k — 2), or 
(k — 1)*(k — 2)(k — 3) <0, 


admitting k = 1, 2, and 3. Since 8 = 3, the only possible value is k = 3, and 
then v = 4. 

THEOREM 6.3. The design (1) obtained from S(3, k, v) in Theorem 4.1 is not 
symmetric unless k = 4. 

Proor. For this design to be symmetric, we must have 


v(v — 1)(v — 2) _ os lI) — 2 





v—-l=b-re= 


~ k(k—1(k— 2) (k — 1)(k — 2)° 
Therefore, (v — 2)” — (v — 2)(k — 2) — k(k — 1)(k — 2) = 0, andv — 2 = 


Mik—2)+ V(k— 2)? + 4k(k — 1)(k — 2)} = (k — 1)(k — 2) asin Theorem 
6.2. Thus (k — 1)’(k — 2)(k — 4) < 0, admitting k = 1, 2,3, and 4. Since 8 = 3, 
the only possible values are 3 and 4, and k = 3 is trivial. For k = 4, we have 
v = 8, so that S(3, 4, 8) is the only system S(3, k, v) that splits to give a sym- 
metric design. 

It happens that in the case of S(3, 4, 8), the derived series (2) in Theorem 4.1 
is also symmetric; the two sub-designs are (7, 7, 4, 4, 2) and (7, 7, 3, 3, 1). 
S(3, 4, 8) is the configuration of series in example 5.1 with \ = 2, and is a BIB 
design (8, 14, 7, 4, 3). 





7. Resolvability of designs derived from Steiner systems. A BIB design is 
resolvable if the blocks can be grouped so that each group contains one complete 
replication. These designs are discussed in [1], where it is proved that the neces- 
sary conditions for resolvability are (1) v is divisible by k; (2)b 2 v+ r— 1. 
If a design is resolvable and if b = v + r — 1, then it is affine resolvable and 
k’ is divisible by v; in this case, pairs of blocks chosen from two different replica- 
tions have a constant number k’/v of elements in common. We have at once that 
for n > 0, S(2, k, k* + nk) fulfils the necessary conditions for resolvability; 
S(2, k, k? — nk) is not resolvable; if S(2, k, k*) is resolvable, then it is affine 
resolvable. 

THEOREM 7.1. All designs S(3, k, nk) satisfy the necessary conditions for re- 
solvability; the only possible affine resolvable design S(3, k, nk) is S(3, 4, 8). 

Proor. Because of resolvability, b => v + r — 1, which implies 


ve — De — 2) - 4 © — DO — 2) 
k(k — 1)(k — 2) (k — 1)(k — 2) 


that is, (» — 2) — (k — 2)(v — 2) — k(k — 1)(k — 2) = O. Therefore 
(v — 2) = 4{(k — 2) + Vk — 2)? + 4k(k — 1)(k — 2)}. 


oo}. 
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For affine resolvability equality holds, and then (k — L)(k — 2) Sv — 2 = 
Ri(k — 2) + Vk — 2)*+ 4k(k — 1I)(K — 2)}. Thus(k — 1)*(k — 2)(k — 4) $0, 
admitting k = 1, 2, 3, and 4. Since 8 = 3, the only possible values are 3 and 4, 
and k = 3 is trivial. Hence the only possible non trivial affine resolvable design 
S(3, k, v) is S(8, 4, 8); since S(3, 4, 8) is a member of the series of example 5.1, 
it actually is affine resolvable. 

THEoreM 7.2. Design (1) of Theorem 4.1 derived from S(3, k, v) is not affine 
resolvable, but satisfies the condition for resolvability. 

Proor. Here b = v + r — 1 implies 


(@ — D& — 2) — k) S & — 2)— b) 
k(k—1)(k-—2) ~ (k — 1) — 2) 


Proceeding as in Theorem 7.1, we get 


W\ 








+v— 2. 


(k — 1)(k — 2) S (’ — 2) = {2k —3 + SY (2k — 3)? + 4(k — 12k — 2)}. 
For affine resolvability this simplifies to give (k — 1)(k — 2)(k° — 6k + 6) < 0, 
admitting k = 1, 2, 3, and 4. Since 8 = 3, the only possible values are 3 and 4. 
The design corresponding to k = 3 is trivial, and the design derived 
from S(3, 4, 8) is not resolvable. For k > 4, thenb >v+r-— 1. 


8. S(3, 4, 8). S(3, 4, 8) is a BIB design with parameters (8, 14, 7, 4, 3). Num- 
bering the elements from 1 to 8, the design may be written: 


(1,6, 7,8) (2,3, 4, 5) (1,4, 5,8) (2, 3, 6, 7) 
(1,3, 5,7) (2, 4, 6, 8) (1,3, 4,6) (2, 5, 7, 8) 
(1, 2, 5,6) (3, 4, 7, 8) (1, 2,4, 7) (3, 5, 6, 8) 


(1, 2,3, 8) (4, 5, 6, 7). 


Each pair of blocks is a replication, and any block from one replication has 
exactly k’/v = 2 elements in common with all blocks in the other replications. 
This is the only Steiner system that splits to give symmetrical designs. The 
design (7, 7, 3, 3, 1), that is, S(2, 3, 7) is obtained by selecting blocks containing 
a given element, say 8: (1, 6, 7) (1, 2, 3) (2, 4, 6) (3, 4,7) (1,4, 5) (2, 5, 7) (3, 5, 6). 
The design (1) of Theorem 4.1 is formed by the remaining blocks: (1, 3, 5, 7) 
(1, 2, 5, 6) (2, 3, 6, 7) (4, 5, 6, 7) (1, 3, 4, 6) (1, 2, 4, 7) (2, 3, 4, 5); the parameters 
are (7, 7, 4, 4, 2). 
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NOTES 


FURTHER REMARK ON THE MAXIMUM NUMBER OF 
CONSTRAINTS OF AN ORTHOGONAL ARRAY 


By EstHer Se1pEn! 


Howard University 


1. Summary. R. C. Bose and K. A. Bush [1] showed how to make use of the 
maximum number of points, no three collinear, in finite projective spaces for 
the construction of orthogonal arrays. In particular, this enabled them to con- 
struct ar orthorgonal array (81, 10, 3, 3). They proved, on the other hand, that 
in the case considered the maximum number of constraints does not exceed 12. 
Hence they state, ““We do not know whether we can get 11 or 12 constraints in 
any other way.” A partial solution to this problem was given by the author [2]. 
It was shown that the number of constraints cannot exceed 11. The purpose of 
this paper is to give a complete solution to the above stated problem, namely, 
to prove that no way exists which could give a number of constraints, of the 
considered orthogonal array, greater than ten. As a consequence of the proof it 
follows also that any orthogonal array with ten constraints satisfies a unique 
algebraic solution. It is not known, however, whether the arrays constructed 
by the geometrical method form the totality of orthogonal arrays of the con- 
sidered type. 


2. Introduction. The proof is based on an algebraic property of orthogonal 
arrays, pointed out by Bose and Bush [1]. Let n{; denote the number of columns 
belonging to an array consisting of k rows that have j coincidences (j elements 
equal) with the ith column. A necessary condition for an array (As‘, k, s, t) to be 
orthogonal is that, whatever be the number h such that 0 S A S&S t, the follow- 
ing equalities hold. 


k 
De misc = ¢(as'” — 1), jp = 1,2,--- 
j= 


where the c’s are binomial coefficients. 
In the case considered the equalities become, for 7 = 1, 2, --- , 81, 


k k 
denis = 80, De i(j — Imi = Bk(k — 1), 
j= j= 


OF sua ‘ 
2 inis = 26k, De iG — IG — 2)nij = 2k(k — Yk — 2). 


j=0 


Received November 17, 1954. 
1 Now at Northwestern University. 
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Use will be made of Lemma 1 proved in [2] which asserts that an orthogonal 
array (81, 10, 3, 3) for which nj} = 0 for j = 5 cannot be extended to an eleven- 
rowed orthogonal array. It will be assumed without loss of generality that the 
first column consists of zeros only. 


3. Derivations. 
Lemma 1. If k = 4, then ni, is equal to zero or two for i = 1, 2, -- 
Proor. Equations (1) become in this case 


nio + nn + nie + nis + na = 80, 
Nix + 2nis + Bnis + 4niy = 104, 
Qnie + Gni; + 12n‘, 96, 

6ni; + 24ni = 48, i =1,2,---,81. 


Thus an orthogonal array with four constraints has to satisfy one of the follow- 
ing solutions. 





Solution 





I 0 | | 32 16 
4 28 17 
Il 24 | 18 





It is seen that Lemma 1 reduces to showing that solution II is impossible. 
Clearly it will be enough to show that Lemma I holds for 7 = 1. Let us assume 
furthermore that the first three rows have the form 


000000000000000000000000000 
0000000001 11222111222111222 
0001112220000001 11111222222 


LLLLLL111111111111111111111 
000000000 1 11222111222111222 
0001 112220000001 11111222222 


9999 
0000000001 1 1229111229111229 
0001112220000001 11111222222 


where the middle and last thirds of each row are printed below the first third. 
Assume now for the sake of the proof that the fourth row has a zero in the first, 
second, and fourth columns. Then the remaining zeros will be distributed as 
follows. 





ORTHOGONAL ARRAY 


— f Number of zeros 
Serial number of the columns in. Sousth vow 


10-15 
16-21 
22-27 
28-30, 55-57 
31-33, 58-60 
34-36, 61-63 
37-42, 64-69 
43-54, 70-81 





—_— 


~ 


lu mower 


| ere 


bho 
+ 





Consider now the fourth column of the array. The assumptions made imply that 
nis = 0. Hence for i = 4 solution I will have to hold. This means that nfo = 16 
and niz = 24. It will be shown that nip = 16 implies nis = 26, which is impos- 
sible. Let us find first the position of the sixteen columns which have no coinci- 
dences with the fourth column. It follows from the distribution of zeros that 
seven such columns will be found among the columns 37-42 and 64-69. Thus 
the remaining nine columns will have to be among the columns 49-54, 70-81. 
This in turn implies that the fourth row will have to have three zeros among the 
columns 49-54 and 76-81. 

Let us count now the number of columns which have two coincidences with 
the fourth column. 





Number of columns 
Serial number of the columns having two coincidences 


with the fourth column 





3 

5-9 
10-15 

16-21 

22-27 

28-30, 55-57 
31-33, 58-60 
34-36, 61-63 
43-48, 70-75 


Total 


Rl pwr e ome oe 





This concludes the proof of the lemma. 

THEOREM 1. The number of coincidences of any two columns of a five-rowed 
orthogonal array is less than five provided that} = s = t = 3. 

Proor. Suppose that there exists a column which has five coincidences with 
some of the remaining columns of the array. We may assume that this column 
is the first column of the orthogonal array. Consider now the solutions of equa- 
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tions (1) for i = 1 and k = 5. It is easy to see that there are only four sets of 
such solutions. Namely, 





Solution 








Il’ 
ill’ 
av” 





It will be shown that none of these solutions can give rise to an orthogonal array. 

Let us assume without loss of generality that the first three rows are the same 
as considered in Lemma 1 and that the fourth and fifth rows have also a zero 
in the first column. Consider first solution I’. In this case n{; = 1 and ni} = 2 


Thus the five-rowed array contains a three-rowed subarray in which ni; = 3; 
but this contradicts the fact that \ = 3. It will be shown next that the set of 
solutions II’ cannot lead to an orthogonal array. Consider two triples of rows, 
namely the triple consisting of the second, third, and fourth rows, and that con- 
sisting of the second, third, and fifth rows. Since \ = s = t = 3, each of these 
triples of rows has to include three columns of each of the following four types. 

The column has a zero in the fourth or fifth column, respectively, and one 
of the four possible couples consisting of one’s and two’s only in the second and 
third rows. By II’ ni; = ni; = 0. Thus each of the last twelve columns of the 
first third of the orthogonal array will have a zero either in the fourth or in the 
fifth row but not in both. Hence these twelve columns will be divided into two 
groups each consisting of six columns of the considered types such that one 
group belongs to the triple of rows including the second, third, and fourth rows, 
and the other to the triple containing the second, third, and fifth rows. These 
groups are clearly not identical in respect to their content—one’s and two’s— 
in the second and third rows. The remaining six columns of the considered types 
will have to be among the last twelve columns of the second and third parts of 
the array. Since nj; = 0, these six columns will have to be identical regarding 
their content in the second and third row. This is clearly impossible. 

Finally, the nonexistence of the orthogonal array satisfying solution III’ or 
IV’ follows immediately from Lemma 1. Clearly, if we delete from an array 
satisfying III’ one row with a zero in the column having four coincidences with 
the first column, we will obtain a four-rowed subarray satisfying solution I. In 
the case of solution IV’ any four-rowed subarray would have to satisfy solution I. 
This establishes the theorem. 

Coro.uary. Any orthogonal array (81, k, 3, 3) satisfies the equalities ni; = 0, 
provided that] = 5and1 S17 S 81. 

THEOREM 2. The number of constraints of an orthogonal array (81, k, 3, 3) can- 
not exceed 10. 
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Proor. Theorem 2 is an immediate consequence of Theorem 1 and Lemma 1 
of [2] which established that if, for some i,ni} = 0 for all 7 = 5, then such an 
array cannot be extended to an eleven-rowed orthogonal array. 

Remark. It was also shown in Lemma | of [2] that if k = 10, then the array 
satisfies a unique set of solutions. Namely, ni; = 60, n}; = ni2 = 0, nit = 20, 
nic = O, for all i = 1, 2, --- , 81. Hence any array constructed by the geometri- 
cal method developed by Bose and Bush [1] will satisfy this set of solutions. 
The problem of obtaining the totality of orthogonal arrays was investigated 
neither in the considered case nor in related cases. 

In conclusion, we wish to remark that this paper restores the validity of the 
abstract published in Ann. Math. Stat., Vol. 25 (1954), p. 177, which was unduly 
corrected in [2]. 

REFERENCES 
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A THEOREM ON CONVEX SETS WITH APPLICATIONS! 
By S. SHERMAN 
Moore School of Electrical Engineering, University of Pennsylvania 


1. Summary and introduction. T. W. Anderson [1] has proved the following 
theorem and has given applications to probability and statistics. 

TueoreM 1. Let E be a convex set in n-space, symmetric about the origin. Let 
f(x) = 0 be a function such that i) f(x) = f(—2x), ii) {x | f(x) 2 u} = K, is conver 


for every u (0 S u S —) and iii) i f(x) dx < «, then 
E 


(1) [ 1 + ky) dx = [ 10 + y) dx fon OSk #1. 


The purpose of this paper is to prove what can be considered a generalization of 
Andefson’s Theorem and to give different statistical applications. 

Functions in J, satisfying the hypothesis were called unimodal by Anderson 
and he noted in [1] that if we let ¢(y) be equal to the right hand side of (1) then 
¢ is not unimodal in his sense insofar as it does not necessarily satisfy 77 (i.e., 
there exist f, EZ, and uw such that {x | g(x) = u} is not convex). His example is 

Received January 17, 1955. 
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the case where n'= 2 and 
3, |r| = I, \aa| Ss l, 
f(x) = 2, \ary| S A, 1. &, \are| + 5, 


0, other z, 


where 2; , 22 are the components of z relative to rectangular cartesian coordinate 
system.” Let E be the set of vectors where |z;| < 1, |z2| S 1. The set {x | g(x) = 6} 
is not convex sinceforz = (.5,4) andz = (1,0), ¢ (x) = 6, whilefor z = (.75,2), 
¢g (x) < 6. The point of departure of this paper is to see what can be said about ¢. 
This is achieved in Theorem 2, giving a stronger and more symmetrical state- 
ment than Anderson makes (but one which does not yield more information for 
his applications). The main Lemma, presented below, proceeds along the line of 
his argument but squeezes out additional information (convexity of level lines) 
under a weaker hypothesis (no symmetry assumptions) than Anderson uses at 
the corresponding stage of his argument. 

2. Main result. Let E and K be convex sets in n-space, & (with no symmetry 
assumptions made at this time concerning E and K). For lebesgue measurable 
A C &, let V(A) be the lebesgue measure of A. 

Lema 1. Jf $(y) = V{(E + y) n K}, then {y | %(y) = u} ts convex for each 
real u and convex E and K. 

Proor. For a , a2 = 0, a1 + a, = 1, we show 


(2) a[(Z + y:) 9 K) + a[(E + ye) K) C (FE + anys + aye) n K. 
A typical element of the vector sum on the left hand side of the inclusion is 
ay(21 + yi) + aro(x2 + yo) where %, t2 € E,a1 + ye K, t2 + y2€ K. Since £ 
and K are convex a;(x; + yi) + ae(re + ye) € K and aya, + ate e E. These im- 
ply that ai(%; + y1) + ao(x2 + yo) ¢ (E + ay: + a2y2) mn K which establishes re- 
lation (2). 

Suppose @(y,) = uand @(y.) = u. It is desired to show that 
(3) Play + aye) > u. 
By (2), 
(4) Play + 2/2) =. Vita [(F + yi) n K| + a.[ (EF + Y2) n K}}. 
By the Brunn-Minkowski inequality, 
V""(al(E + yx) 2 K) + oa[(E + yx) 2 K]) = cb" "(y:) + eg" (ys) Bull”. 
The last inequality with (4) yields (3), which proves the Lemma. 

Let @o be the closed convex cone generated in the uniform norm (i.e., 


'f|| = sup {|f(x)|}) by the characteristic functions of symmetric, compact, con- 
vex sets. Let @; be the closed convex cone generated in the ZL, norm (i.e., | f|), = 


fis dx) by the characteristic functions of symmetric, compact, convex sets. 


2 The subscript notation is to have this denotation only for this example and one at 
the end of the paper. . 
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Consider the norm || \\s given by |!f\|; = max {|/f\|, |/f/1}. Let @; be the closed 
convex cone generated in the || ||; norm by the characteristic functions of 
symmetric, compact, convex sets. Note that for E, K compact, convex and sym- 
metric, the’ convolution of xx and xz, the characteristic functions of K and E 
respectively, evaluated at y becomes 


xxexe”) = | xe(2)xaly — 2) de = 6) 


Furthermore ®(y) is continuous and by Lemma 1 has the property that 
{y|&(y) = u} is a symmetric, compact, convex set for each real u > 0. 
If we let (x, u) be the characteristic function of {y|@(y) = wu} then 
le" SF. B(x; je) — &(x)|| > 0 as « > 0°. Thus ® ¢ Cy. The same argument 
shows that @ e C; and ® ¢ @; . The continuity of convolution in the | ||, and 
| ||; norms implies 

‘THEOREM 2. C,*C; C Cy and C3+#C; C C;. 

By observing that V(K) ‘xx ¢ @; for K compact, symmetric, and convex and 
using the continuity properties of translation in the L; norm one can extend the 
conclusion of Theorem 2 to read @,#@; = @;. 

It should be observed that for each ® ¢ @; and each vector y ¢ &, (ky) 2 P(y) 
for 0 <= k Ss 1. This is true for %, the characteristic function of a symmetric 
convex set, and therefore for convex combinations of these. Since this property 
is preserved by taking uniform limits it fo!lows that this property is true for each 


@ < @;. Thus Theorem 2 implies Anderson’s Theorem, since [ f@t+y)dzee. 
gE 


3. Applications. In the succeeding paragraphs we generalize to n dimensions 
and strengthen slightly in the case of 1 dimension some results of Z. W. Birn- 
baum [2] on random variables with comparable peakedness. It should be noted 
that the statistical problems with which Anderson concerned himself could be 
formulated in terms of peakedness and so the succeeding remarks, e.g., Lemma 2, 
apply to his applications also. Since in the applications Birnbaum is concerned 
with continuous random variables [3] and peakedness about the origin we will 
formulate our definitions for this case. 

If Y and Z are continuous random variables whose values lie in & and whose 
probability densities are y(Y) and f(Z) respectively, then Y is said to be more 
peaked (about the origin) than Z if 
(5) (¢ — f)*x2(0) 2 0 
for each compact, symmetric, convex E of & This conincides with Birnbaum’s 
definition when n = 1. 

Lemma 2. If 1) (g — f)*xz(0) = 0 for each compact, symmetric, conver E Cc & 
and 2) h ¢ @;, then (yo — f)*hex2(0) = 0. 

Proor. It suffices to show that (¢ — f)*Xr*xz(0) = 0 for compact, symmetric, 
convex F since the closed (in | ||; norm) convex cone generated by these 
functions is dense in @; . However xr*xz € C2 and so Lemma 2 follows from 1). 
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Using the lemma above we establish an n-dimensional variant of Birnbaum’s 
Lemma (with slight reduction of symmetry hypotheses). 

Lemma 3. Let Y1, Yo, Z1, Zz, be continuous random variables with values in & 
and probability densities ¢:(Y1), ¢2(Y2), fi(Z1), and fo(Z2) such that 

1) Y, and Y2 are independent, Z, and Zz are independent, 

2) go € C; and fi € Cs 

3) Y; is more peaked (about 0) than Z; for i = 1, 2. 
Then 

4) Y, + Y2 is more peaked (about 0) than Z; + Z2. 

Proor. For each compact, symmetric, convex set, F, 


grtgoexe (0) +5 Sisferxe (0) 
¢iteo*Xe (0) = Sixgoexe (0) + fiege%xe (0) _ Sisfowxe (0) 
= (g — fi)*ooexe(0) + 2 — fo)*fiexe(0) = 0. 


Nore. The algebraic manipulations in the proof are already implicit at one 
stage of Birnbaum’s argument but after that he uses devices whose direct ana- 
logue did not go through for n > 1 and whose weight is carried here by previous 
Lemma. The slight reduction in symmetry assumptions in the case n = 1 can 
be established without all the machinery used here. The later parts of his paper 
also go through for the multivariate case. 

Here it may be wondered whether the requirement that g. ¢ @; and f; ¢ @; can 
be changed to gz ¢ @; and fz ¢ @; . In the following example (constructed by T. W. 
Anderson and the author) for n = 1 not only ¢e € C3 , fe ¢ C3; but also Y; and Z, 
are symmetrically distributed and the other hypotheses of the Lemma are satis- 
fied (with the exception of the random variables being continuously distributed, 
but that can be taken care of by considering nearby distributions). Nevertheless 
Y, + Y2 is not more peaked (about 0) than Z, + Z,. 

EXxamPLe. Let Y,, Y2 be independent random variables, Z; , Z, independent 
random variables such that Y, = Z; with Pr {Y; = 5} = Pr {¥; = —5} = 
Pr {Z, = 5} = Pr {Z, = —5} = 3 and 


(3, (Yl s 
g2( Yo) = | 

\0, \Y2} > 
bie (Z| S 
ree eae SNe 


Here it is not true that Y, + Y2 is more peaked (about 0) than Z; + Z. 
We close this note with the following 


Consecture. Let f ¢ I,(&) and let | f(a + y) dx = ®(y), theng(ky) = De(y) 
zg 


for each y ¢&, 0 S k < 1, and for each compact, symmetric, convex E C & 
implies that f ¢ C;. 





ABSTRACTS 


REFERENCES 
{1] T. W. Anprerson, ‘‘The integral of a symmetric unimodal function over a symmetric 
convex set and some probability inequalities,’’ Proc. Amer. Math. Soc., Vol. 6 
(1955), pp. 170-176. 
[2] Z. W. Brrnsaum, “On random variables with comparable peakedness,’’ Ann. Math. 
Stat., Vol. 19 (1948), pp. 76-81. 
[3] H. Cramétr, Mathematical Methods of Statistics, Princeton University Press, 1946. 


————— ar 


ABSTRACTS OF PAPERS 


(Abstracts of papers presented at the Berkeley meeting of the Institute, July 14-16, 1955) 


1. Nonparametric Mean Estimation of Percentage Points and Density Func- 
tion Values. Jonn E. Wausu, Lockheed Aircraft Corporation. 


Consider a sample of size n from a statistical population with probability density func- 
tion f(x) and 100p per cent point 6, . The function f(z) is of an analytic nature. Some meth- 
ods are presented for approximate nonparametric expected value estimation of @, and of 
1/f(@,). A nonparametric estimate whose expected value differs from 6, by terms of order 
n~7/2 can be obtained. For 1/f(6,), an estimate whose expected value is accurate to terms 
of order n~* can be obtained. The estimates developed consist of linear functions of specified 
order statistics of the sample. The order statistics used are sample percentage points with 
percentage values which are near 100p. Let m be the number of order statistics appearing 
in an estimate (m S 7). Coefficients for the linear estimation function are obtained by solv- 
ing a specified set of m linear equations in m unknowns. All estimates derived for 6, have 
variances of the form p(1 — p)/nf(6,)? + O(n-*/?). Without additional information, all 
that can be determined about the variances of the estimates derived for 1/f(6,) is that they 
are 0(n~/?), Thus both types of estimates are consistent but the estimates for 6, are more 
efficient than those for 1/f(6,). 


2. On the Concept of Probability in Quantum Mechanics. A. O. Barut, Stan- 
ford. 


Some mathematical consequences of the following particular probability measure are 
discussed: Consider the one to one correspondence between the elements of the sample 
space 2 and the linearly independent elements of a unitary space \/ (in general a Hilbert 
space). The probability measure of sets in @ is defined by p(S) = (Psz, x) = || Pgzx |l?, 
where (z, z) = 1 and Py is the projection operator on the manifold spanned by vectors 
corresponding to the points in S. The vector z characterizes the system or the experiment. 
It follows from p(S) that random variables are represented by linear Hermitian operators. 
These random variables may have an intrinsic correlation coefficient even though they 
are independent in the ordinary sense; they apply to a larger class of phenomena. 


3. Two-Sample Estimates of Prescribed Precision. (Preliminary Report.) 
ALLAN Brrnspaum, Columbia University and Stanford University. 


Let 2 , 2: , --+ be independent observations on a random variable X with density (or 
discrete probability) function f(z, 6), with 6 unknown, 6 «2, E(X) = w = u(@), Var (X) = 
o*(0). Suppose an unbiased estimate of u is required, with variance not exceeding a given 
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positive v. Let wu = u(x , --- 2m) be such that E(u) S 1/07(@). Let n = 1/w. Let n’ = 
[n] = the largest integer not exceeding n, with probability y, , and n’ = [n] + 1, with prob- 
ability 1 — y, , where 7, is determined so that E(1/n’ | n) = 1/n. Then the estimate yp’ = 
(Im41 + +++ + 2main')/n’ satisfies the requirements. If E(u) = 1/e7(@), then Var (u’) = v, 
and the estimates yu’ satisfy homoscedasticity assumptions of standard statistical methods; 
such functions u are given for estimating variances of normal distributions and components 
of variance. For estimating binomial and Poisson means, approximate homoscedasticity is 
obtained by use of u = c/(ZT x; + 1)(m — ZT xz; + 1) (binomial) and u = c/(Zf 2; + 1) 
(Poisson), where c is a suitable constant. Efficiencies, measured by the Wolfowitz-Cramer- 
Rao bound, are good in many cases. Relative to single sample estimates, appreciable sav- 
ings are obtained in binomial estimation except for p near 4. Refined estimates based on 
complete statistics are being investigated. Dr. W. C. Healy has just informed the writer 
that most of the above ideas were recently found independently by him. 


4. Minimax Test for the Parameter of a Poisson Process. J. V. BREAKWELL, 
North American Aviation, Inc. 


The problem of determining all the Bayes procedures for distinguishing between two 
Poisson process rates, when the losses due to accepting the wrong rate are known and when 
the cost of observation is proportional to the observation time, is given an explicit solu- 
tion. The selection of the minimax procedure is explained. The solution is given for the 
extended problem in which a process rate can have any value and it is desired to decide 
whether this rate is above or below some critical rate, and in which the loss due to a wrong 
decision is proportional to the difference between the actual rate and the critical rate. In 
some situations the minimax procedure is a mixed strategy. For these situations an at- 
tempt is also made to obtain the minimax pure strategy. 


5. Distribution of a Definite Quadratic Form in Noncentral Normal Variates. 
James Pacnares, Hughes Aircraft Company. 


The results of a recent paper by the author, ‘‘Note on the distribution of a definite 
quadratic form,’”’ Ann. Math. Stat., Vol. 26 (1955), pp. 128-131, have been extended to the 
noncentral case. That is, an expression has been derived for the distribution of a definite 
quadratic form in noncentral independent normal variates which depends only on the 
value of the determinant of the form and on the moments of the inverse quadratic form 
in normal variates with imaginary means. TuroreM. Let Q, = } Dj: a;zj , where the zx; 
are independent N(u; , 1) variates, a; > 0. Let Q%° = 42/., aj' yj , where the y; are in- 
dependent N (in; , 1) variates, i = +4/—1. Then Pr(Q, < t) = e>t"/? | A |-¥2Dpo(—t)*ex/ 
kir(k + 1+ n/2), where c;, is the k-th moment of Q3*,\ = $2fi w; , and | A | = a --- an. 
The above series is absolutely convergent. The moments of Q;" are best obtained from the 
cumulants. The k-th cumulant of Q%" is 4(k — 1)! Dfs aj*(1 — ky}). 


6. Errors in Normal Approximations to Certain Types of Distribution Functions. 
(By Title.) J. T. Cau, University of North Carolina. 


Suppose that for every integer n = m = 1, Fa(x) = Cy / (1 + y?/n)¥™/? dy is a edf, 
“=e 


where C, and m depend only on n and lim,+«. m/n = 1. Upper and lower bounds are ob- 
tained for F,(x) in terms of #(z), the standard normal edf, and it is shown that 
lim,+.». F,(z) = (x) for every fixed z. Applications are given to the t-distribution, 7-dis- 
tribution, the distributions of the correlation coefficients, etc. Very simple upper bounds 
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are derived for the errors in using the normal approximation. For example, if F,(xz) is the 
cdf of the t-distribution with n degrees of freedom, then | F,(x)/#(z) — 1| < 1/n for all 
n = 8. (Work supported by the Office of Naval Research.) 


7. Estimation of the Parameters of the One- and Two-Parameter Single Ex- 
ponential Distributions from Singly and Doubly Censored Samples 
(By Title). A. E. SARHAND and B. G. GREENBERG, University of North 
Carolina. 


In biological data involving life-testing and incubation periods, the response pattern 
usually follows an exponential distribution. Owing to the speed of the initial reaction and 
the drawn out waiting time for the ultimate observation, samples are frequently censored 
at both termini. Tables of coefficients are provided to calculate the best linear estimate 
in such censored samples up to size 10 of both uw and a in the two distributions f(y) = e/*/e 
and f(y) = e~®-»)/*/¢, Variances of these estimates and their efficiencies relative to the 
uncensored sample are also provided. Combining this information with a table of expected 
waiting times, the efficiency of an experiment per unit of waiting time can be plotted against 
each observation and the censoring procedure carried out on a predetermined basis. 


8. The Joint Distribution of Serial Correlation Coefficients. (By Title.) G. 8S. 
Watson, Australian National University. 


Quenouille (Ann. Math. Stat., Vol. 20 (1949), p. 561) found the exact joint distribution 
of the serial correlation coefficients (with circular definitions) in samples of an odd num- 
ber of observations and gave an approximation to it without proof. In this paper, 
Quenouille’s distribution is derived for arbitrary sample sizes by a method which avoids 
the calculus of residues of several complex variables. It is further shown that Quenouille’s 
conjectured approximation is correct and that it arises from the smoothing of the convex 
polyhedral region of joint variation of the full set of serial correlations. A test for inde- 
pendence based on periodogram ordinates, is suggested. 


9. On Tests of Certain Hypotheses Invariant Under the Full Linear Group. 
(Preliminary Report.) Cuartes M. Stein, Stanford University. 


There exist two probability measures y: , we absolutely continuous with respect to ordi- 
nary Lebesgue measure on the Cartesian product C* of three circles (circumferences) C 
and a measurable set S C C? such that 8 = infr w2.(T’S) > supr ui(T’S) = a, where T ranges 
over the group of homeomorphisms of C onto itself and 7’(@; , 62 , 63) = (70: , T2 , T'03). 
Of course the result remains true if 7’ is restricted to be a projective transformation (with 
a particular homeomorphic identification of C with the real projective line). Thus, for 
testing (with a single observation) the hypothesis that arandom point X of C* is distributed 
so that, for some projective transformation 7, T’X has distribution uw: , against the hy- 
pothesis that for some 7’, T’X has distribution we , the rejection region S has minimum 
power 6 greater than its maximum size a. However the induced group of T’ on C* is transi- 
tive on a set whose complement has Lebesgue measure 0. Thus any test of size a invariant 
under the projective group must also have power a. Since the group of projective trans- 
formations is a homomorphic image of the multiplicative group of all non-singular real 
2 X 2 matrices, the result indicates that, if the classical tests in multivariate analysis 
have any exact minimax properties in the class of all tests, it must be due to special prop- 
erties of the normal distribution, rather than to the group-theoretical structure alone. 
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10. Asymptotic Formulae for the Distribution of Hotelling’s Generalized T} 
Statistic. Korcar Iro, University of North Carolina. 


In the analysis of variance for the means of k p-variate normal populations, let So and 
S; be sample ‘‘within’”’ and “‘between’’ dispersion matrices based on n and m degrees of 
freedom, respectively, where it is assumed that p S n but m may be 2 p or < p. Hotelling 
defines a statistic T) for testing equality of the k population mean vectors as T; = m tr 
S, So". In this paper any percentage point of the 79 distribution under the null hypothesis 
is expressed in an asymptotic series for large values of n, which involves the corresponding 
percentage point of the x? distribution with mp degrees of freedom. This result generalizes 
the asymptotic formula for the generalized Student 7 given by Hotelling and Frankel 
(Ann. Math. Stat., Vol. (1938), p. 96). An asymptotic formula for the c. d. f. of T¢ is also 
given together with an upper bound for the error committed when all but the first few terms 
are omitted in the series. This formula is a sort of multivariate analogue of Hartley’s 
formula of “Studentization” (Biometrika, (1944), pp. 173-180). 


11. Asymptotic Distribution of Maximum Likelihood Estimates in Factor Analy- 
sis in the Loading-Normalized Case. Herman Rustin, Stanford Univer- 
sity. 


If one considers the factor analysis problem with 23; = Zia faj + ui; , where the f’s 
have covariance matrix M and the u’s have diagonal covariance matrix = and are inde- 
pendent of the f’s, the maximum likelihood estimates of A, M, and = were derived by Ander- 
son and Rubin in a paper to appear in Proceedings of the Third Berkeley Symposium on Prob- 
ability and Statistics. By suitable matrix manipulation, we obtain a symmetric modified 
Newton’s method for =-1(A — Ao) and ($-! — 2X9"), starting from the initial approximation 
Ao , Xo . If the normalization is on A alone, then the normality of the f’s can be relaxed and 
the asymptotic covariance of the estimates is given by the inverse of the coefficient matrix. 
If we relax the condition of normality of the u’s and consider, instead of £ — 5, the matrix 
= — S, where S is the diagonal matrix of sample variances of the u’s, the asymptotic var- 
iances of the o;; are reduced by 2/e;; . 


12. A Run Test of the Hypothesis that the Median of a Stochastic Process is 
Constant. T. S. Fercuson and Cuartes H. Krarr, University of Cali- 
fornia. 


Let X: ,X2,--- , Xn, +--+ be a sequence of observations made at times t; < t2 < --» < 
t, < --+ on inde endent variables having a constant median equal to m. Let Y; = 1if X; > 
m and zero otherwise. Let ri (resp. rz) be the number of runs of 1’s (resp. 0’s) of length 
k out of the first r runs. The joint distribution of re k = 1, 2, --- is found and the x? test 
of the null hypothesis is proposed. Under alternative hypotheses for which P{Y; = 1} is 
periodic, the asymptotic distribution as r — © of ri is derived and the power of the test 
is found. Models are discussed for which the observations X, form a stationary process 
with E(X,Xn4x) = o%p-* and asymptotic properties of proposed tests of the analogous 
hypotheses are discussed. 


13. Analysis of Dispersion on a Sphere. (By Title.) G. S. Watson, The Aus- 
tralian National University. 


In palaeomagnetism, the observations are directions of remanent magnetism of rock 
specimens. These observations may be regarded as position vectors of points on a unit 
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sphere. The probability density, exp(k cos 6), where @ is the angle between the polar and 
observation vectors, has recently been suggested by Fisher for the analysis of such data. 
In this paper some exact tests and a full range of approximate tests, sufficiently accurate 
for most practical situations, are given for testing hypotheses about the precision constant 
k and the polar vector. The approximate tests, which involve only the F-distribution, 


arise from the fact that, when k is large, the points have a circular normal distribution 
about the pole. f 


14. Convergence in Distribution and Fourier-Stieltjes Transforms of Random 


Functions. (By Title. Preliminary Report) EMaNnvet Parzen, Columbia 
University. 


Let Y(t) and Y,(t), for n = 1,2, --- , be random functions on 0 S ¢ S 1, whose sample 
functions belong to an (incomplete) Banach space ®, usually the space of functions con- 
tinuous except for a finite number of finite jumps. Define Y,(t) to converge in distribution 
to Y(t) if, for every functional g[y(t)] continuous in norm except for a set of functions of 
measure 0 according to the measure induced by Y(t) on ®, the random variable g[Yx(t)] 
converges in distribution to g[Y(t)]. Under this definition, it is immediately seen that if 
T(y(t)] is an operator on & to itself which is ‘“‘continuous almost everywhere” as above, 
then 7[Y,(t)] converges in distribution to T[Y(t)]. A general theorem, stating sufficient 
conditions for convergence in distribution and involving countable decompositions of the 
random functions, is given. It is applied to obtain quick proofs of various theorems of 
Donsker, involving consecutive sums of independent random variables and empirical dis- 
tribution functions, but, up to the present time, only for the case of Lz norm. In these 
applications, an important role is played by the Fourier-Stieltjes transform Z,(v) = 


I e*rivt dY,(t), defined for v = 0, +1, +2, ---. 
0 


15. On the Statistical Analysis of Markov Chains. (By Title. Preliminary Re- 
port) Lxo A. Goopman, University of Chicago. 


T. W. Anderson has studied statistical inference in Markov chains with particular ap- 
plication to data in which each observation is a sequence of states, over a finite number 
T of time points, from a Markov chain with the same transition probability matrices P; = 
{piz(t — 1, t)}, and n;(0) observations are in state i at the time origin (Ann. Math. Stat., 
Vol. 22 (1951), p. 607). He assumes that n;(0) — ©, and presents likelihood ratio tests for 
the following hypotheses: (a) P; is stationary (i.e., P; = P = {pi;}) against alternatives 
that it varies over time; (b) P is a given matrix (or that certain elements of P are given); 
and (c) the process is first order against the alternative that it is second order. The present 
paper presents x? tests of goodness of fit which are analogous to the likelihood ratio tests 
for hypotheses (a), (b), and (c), and similar tests are also presented for the following hy- 
potheses: (c’) that the process is rth order against the alternative that it isr + 1th order; 
and (d) that s samples of observations are samples from the same Markov chain P. P. G. 
Hoel (Biometrika, Vol. 41 (1954), p. 430) has presented a likelihood ratio test of (c’) when 
a single observation (sequence of states) is obtained and 7’ — ~ in the ‘positively regular’ 
case. The present paper presents an analogous x? test of goodness of fit for this case. An 
advantage of the x? tests of goodness of fit presented in the present paper is that, for many 
users of these results, the motivation for these tests and the application of the methods 
presented is simpler. 
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16. Convergence Properties of Elements of a Certain Class of Stochastic Ap- 
proximation Processes, Continued. (By Title.) DonaLp L. BuRKHOLDER, 
University of North Carolina. 


For each natural number n and each real number z let Z,(z) be a random variable such 
that R,(z) = E Z,(x) exists. Suppose @ is a real number, y is a positive number and {y,!} 
is a real number sequence such that (x — wa)Ra(x) > 0, for each pair n, x where 2 ¥ pn , 
and @ — un = O(n-7) asn— ~. Then there exists a stochastic approximation process {2} 
such that, under several different sets of further conditions, n*(z, — 0) is asymptotically 
normal where 0 < ~ S 3, § < y. The theorems proved for the above case contain Chung’s 
Theorem 9, Ann. Math. Stat., Vol. 25 (1954), pp. 463-483, on the class of Robbins-Monro 
processes. The theorems also imply asymptotic normality of the Kiefer-Wolfowitz processes 
under fairly general conditions, and asymptotic normality of certain stochastic approx- 
imation processes useful in estimating the mode of a density function, also, under fairly 
general conditions. 


——— 


(Abstracts of papers presented at the Ann Arbor meeting of the Institute, 
August 30-September 2, 1955) 


1. Distribution of the m-th Range, J. AkrHuR GREENWoop, Manhattan Life 
Insurance Company. 


Gumbel [Ann. Math. Stat., vol. 18 (1947), p. 410] has expressed the distribution of the 
m-th range as the convolution of the distribution of m-th extremes. The result of Garti and 
Consoli [Studies presented to R. v. Mises, New York, 1954, p. 302, equation (1)] is applied to 
express the differential of probability of the m-th range in terms of Bessel functions of the 
third kind. Integration by parts yields the distribution. 


2. Approximation to the Distribution of the Sum of Cosines of Random Angles 
(Preliminary Report), Davin DuraNnp, Massachusetts Institute of Tech- 
nology, and J. ARTHUR GREENWOOD, Manhattan Life Insurance Company. 


It has long been known that each component of a circularly symmetric random walk of 
n steps is normally distributed with error O(1/n). The authors recently found (Ann. Math. 
Stat., Vol. 26 (1955), p. 237) that V = 2 cos é,, where the angles é, are independently and 
uniformly distributed, has the characteristic function [Jo(t)]". Numerical values of the 
cumulants are found by expanding the log of the characteristic function, and the normal 
approximation is improved by the expansion Pr[V S z(n/2)}] = ®(z) — n-@iii/16) + 
n-2(o¥/72 + pvii/512) — n-3(11pvii/3072 + gi*/1152 + o*'/24576) + n-*(19¢‘*/19200 + 
425*'/1327104 + o*iii/36864 + ¢**/1572864) + o(n-*), where ¢(z) is the standard normal 
density, ¢/ is its jth derivative, and ® is its integral with lower limit —«. The expansion 
is inverted for use in computing percentage points. 


3. Crystalline Connectivity, J. M. Hammerstey, University of Oxford. 


The paper studies the number of self-avoiding walks on an abstract formulation of the 
connective structure of atomic crystals. The theory bears on the behaviour of absorbent 
media. 


4. Distributions of Roots of Algebraic Equations with Variable Coefficients, 
Joun W. Hamsien, Oklahoma Agricultural and Mechanical College. 


Consider an algebraic equation which can be written in polynomial form as (1) 7” — 
Ein”! + fon"? — --- +(—1)"E, = 0, where the coefficients, & (i = 1, --- , m), are real or 
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complex random variables with a given joint p.d.f. The roots of (1), 7: (i = 1, --- , n) are 
random variables which have a p.d.f. that depends upon the p.d.f. of the coefficients. The 
problem under consideration is to determine the p.d.f. of the roots given the p.d.f. of the co- 
efficients. The case where the £; are complex random variables was considered in a note by 
M. A. Girshick (nAnals of Mathematical Statistics, Vol. 13, (1942), p. 235.). When the &; are 
real, the »; may be real or complex. For real 9; the functional form of their p.d.f. is obtained 
by a change of variables in the p.d.f. of the &; by the use of the relationships (2) & = Diiini , 
f2 = Licjni-ny , +++ , En = kan: , with Jacobian, J; given by I],_; (ni — 7;). However, to 
define the p.d.f. of the »; completely, the region of the root-space over which the p.d.f. is 
greater than zero must be determined. This region is, of course, dependent upon the sub- 
regions of the region in the coefficient-space for which the p.d.f. of the &; is greater than zero 
which give real roots. For complex n; the treatment is similar, but a new set of relationships 
must be found to replace (2). In this case, the ¢; must be expressed as functions of the real 
and imaginary parts of the 7; separately. 


5. On a Modified Sum of Poisson Processes, A. Bruce CLARKE, University 
of Michigan. 


Let / and m be positive integers and, for each i with —l < i S m, let m;(t) be an integer- 
valued Poisson process with parameter \;, proceeding by steps of magnitude 7, the proc- 
esses being mutually independent with m;(0) = 0. Let m(t) = D7L_um;(t). m(¢) will then 
be a temporally homogeneous process of independent increments with generating function 
$(z,t) = E(zm™) = exp [t27L_,(z* — 1)], from which the probabilities ¢,(t) = P,(m(t) =n), 
—* <n< ~,can be obtained. Let n(t) be a process having the same transition probabili- 
ties as m(t) except that the process is restricted to nonnegative integer values, transitions 
giving negative values being forbidden, e.g., if 1 = m = 1 this will be a simple one-step 
queuing process. Let p,(t),0 S n < ~, be the probability distribution of n(t), pa(t) = 
Pr(n(t) = n). Using the Kolmogorov equations it can be shown that there exists a linear 
transformation giving the p,(t) in terms of the qn(t), pn(t) = D2R_.aixge(t). In the case m 2 l, 
this equation reduces to a convolution of the form pp(t) = Doaign+:(t). Under certain con- 
ditions the coefficients a; may be determined explicitly. 


6. A Method of Constructing Nonparametric Multivariate Tests, (Preliminary 
Report), T. W. AnpEerson, Columbia University. 


Let 2 , --+ , 2m be observations from F(z) and y; , --- , ym from G(x). When the observa- 
tions are scalar, nonparametric tests of the hypothesis F(z) = G(x) are based on the ranks 
of the observations or, equivalently, on mo , --- , m_, , where m; is the number of y’s falling 
between the ith and (i + 1)-st ordered z’s. When the observations are vectors, let Ry , --- , 
R, be n + 1 statistically equivalent blocks as defined by J. W. Tukey. (‘‘Nonparametric 
estimation II. Statistically equivalent blocks and tolerance regions the continuous case,’’ 
Annals of Math. Stat., Vol. 18 (1947), pp. 529-539) ; let m; be the number of y’s falling in R, . 
Under the null hypothesis the distribution of mo , --- , m, is the same in the vector case as 
in the scalar case. The distribution of any test criterion based on the m’s is the same in both 
the scalar and vector cases. The choice of the functions used to define the blocks and the 
subsequent test will depend on the relevant alternative hypotheses. These blocks for one 
sample can also be used to test the hypothesis that F(z) is a specified distribution. (Work 
sponsored by School of Aviation Medicine, Randolph Field, Texas, under Contract A F 18 
(600)-941.) 


7. Confidence Intervals for a Measure of Effectiveness (Preliminary Report), 
Gortrriep E. Norerser, Boston University. 


Let p; and pz be the probabilities of success of two ‘‘treatments”’ and define p = (p2 — p:)/ 
(1 — pi). For p: < pe , p may be considered a measure of the greater effectiveness of treat- 
ment 2. Since 1 — p = q2/qi = q, say, the problem of finding a confidence interval for p is 
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equivalent to that of finding a confidence interval for the ratio of the parameters of two 
binomial populations. If y; ,7 = 1, 2, denotes the observed relative frequency of failure for 
the ith treatment, various confidence intervals for gq based on normal approximations to 
the distributions of y2/y: and yz — gy: are derived and compared. Confidence intervals based 
on joint confidence regions for g; and gz are also considered. 


8. Truncated Binomial and Negative Binomial Distributions, Paut R. Riper, 
Wright-Patterson Air Force Base. 


This paper derives a simple estimator of the parameter of a binomial distribution from 
which one or more classes have been truncated, also easily calculated estimators of the two 
parameters of a negative binomial distribution from which the zero class is missing. These 
estimators are analogous to that previously proposed by the author for the parameter of a 
truncated Poisson distribution (Journal of the American Statistical Association, Vol. 48 
(1953), pp. 826-830). They compare favorably with maximum likelihood estimators. Illus- 
trative examples are provided. 


9. Error Rates and Sample Sizes for Multiple Range Tests, H. Leon Harter, 
Wright-Patterson Air Force Base. 


A study is made of error rates and sample sizes for three multiple range tests (the New- 
man-Keuls test, Tukey’s X procedure, and Duncan’s New Multiple Range Test). Multiple 
range tests are used for testing the significance of the range of p out of m ordered means of 
samples of size N, where p = 2, 3, --- , m. For various combinations of m and N, Table 1 
gives maximum and minimum Type I error rates @ (as defined for these tests) when a, (as 
defined for the LSD test) has the values 0.05 and 0.01. For various combinations of m and 
N, Table 2 gives maximum and minimum Type IJ error rates 8 for each of the multiple range 
tests as a function of 6 = | uy — uz |/o, where wy and wy, are the population means corre- 
sponding respectively to the largest and smallest of p sample means and ¢@ is the population 
standard deviation. For various combinations of a, 8 and 5, Table 3 gives maximum and 
minimum required sample sizes N for each of the multiple range tests. Since, for each test, 
the critical range of p means is a non-decreasing function of p, the extreme values of a, 8 
and N occur for p = 2 and p = m. 


10. On Transient Markov Chains with a Countable Number of States, Davip 
BLACKWELL, University of California. 


Let X: , X2,--- form a Markov process with stationary transition probabilities and 
states the non-negative integers. A set I of states is called almost closed if Prob {X, eI 
infinitely often} = Prob {X, e/ for all sufficiently large n} > 0. It is shown that there is a 
decomposition of the set of states into disjoint almost closed sets J; , J2 , --- such that (a) 
all 7; except at most one are atomic, i.e. do not contain two disjoint almost closed subsets, 
(b) the non-atomic J; , if present, has no atomic subsets, and (c) 2; Prob {X, e J; infinitely 
often} = 1. If there are independent identically distributed Y; , Y2,--- with X¥, = Yi+ 
--» + Y, , then the set of all states is atomic. The results are new only if the process has 
transient components. The main tool is the martingale convergence theorem. 


11. A z-transformation and a t-statistic for Serial Correlation Coefficients, 
JouNn 8S. Wuite, University of Manitoba. 


An approximate distribution for the sample serial correlation coefficient from a circularly 
correlated population has been obtained by R. B. Leipnik [Annals of Mathematical Statistics, 
Vol. 18 (1947) p. 80]. In this paper several aspects of Leipnik’s distribution will be con- 
sidered. The moments of the distribution are expressible in terms of hypergeometric func- 
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tions which may be explicitly evaluated for specific values. The transformation, analogous 
to Fisher’s z-transformation for product moment correlation coefficients, is found to be 
z = arc sin r. It is shown that zis asymptotically normally distributed with mean arc sin p, 
where p is the population parameter, and variance 1/7’. The statistic (* = +/N + 1 (r — p)/ 
+/1— 7? has a density function of the type f(t*) = sy4:(t*) + g(t*), where sw4:(t*) is a density 
function for ‘“‘Student’s t’’ with N + 1 degrees of freedom and g(¢t*) is an odd function. The 
function g(t*) goes to zero as the sample size increases. The statistic (¢*)* has an F distribu- 
tion with 1 and N + 1 degrees of freedom. Examples are given for the use of z and ¢* in test- 
ing hypotheses and in the construction of confidence intervals. 


12. Distance Tests with Good Power for the Nonparametric k-sample Problem, 
J. Kinrer, Cornell University. 


Let X;; be independent (1 S$ i S$ n;,1 Sj S k), Xi; having unknown continuous dis- 
tribution function (d.f.) F; . For the nonparametric k-sample problem (k = 2) of testing 
H:F, = F,; = «++ = F, , most known tests have only been shown to have desirable con- 
sistency or power properties against limited classes of alternatives. Let S;(x) be the sample 
d.f. of the n; observations in the jth set. A wide variety of tests of H may be based on various 
measures of distance (dispersion) among the S; which generalize the (possibly weighted) 
Kolmogorov-Smirnov and w*-type criteria for k = 2. For example, supposing all n; = n, 
possible critical regions are large values of V, = supg.r.2 | Sg(xz) — S,(x) | or of 7, = sup.2; 


(S;(z) — S(x))? or ot [ 2 ,(S;(z) — S(x))? dS(x), where S(xz) = 2;S;(x)/k. Criteria like 


V, and 7’, will usually have the better power properties: e.g., for 0 < a, 8 < 1, there exists 
a value 6 (a, 8) such that the test of size a based on V, or 7, has power > £6 against all 
alternatives for which sup,,;.2 | F(z) — Fr(z) | > 6 (a, 8)//n. Limiting d.f.’s under H 
may be found by several methods: e.g., that of n7’, may be obtained by finding in k — 1 
dimensions the probability of absorption of a Wiener process by a sphere whose radius is 
b(1 + t) at time ¢t. Other tests using only previously known limiting d.f.’s also have good 
power properties. Analogous tests may be used for testing H’:F; = --- = F, = G where G 
is specified. 


13. Comparison of Populations Whose Growth Can Be Described by a Branching 
Stochastic Process, A. T. Bharucna-Rerp, Columbia University. 


In this paper the sequential procedure of Girshick [Ann. Math. Stat., Vol. 17, (1946), 
pp. 123-143] for comparing or ranking two populations is applied to some populations whose 
growth can be described by a branching stochastic process (b.s.p.). Let A: and A» be two 
populations, each of whose growth or development can be represented by the b.s.p. {X(f; w), 
t = 0}. The probability density p(t, x; #) = P(X(t;#) = x), z = 0,1, --+ , is assumed to be 
known except for the value of the parameter w. It is of interest to test the composite hy- 
pothesis Hi:w: < we: against the alternative hypothesis H2:w; > w2 , where w; is the value 
of the parameter in p(t, x; ; wi), i = 1, 2. A Wald sequential probability ratio test is set up 
as follows: Select two constants A and B, B < 0 < A, and two values of w, say 4; and 4: , 
where 4, < 42. The two populations are observed continuously, and on the basis of the 
realizations 2:(¢) and 2z2(t) the decision function d(t) = log {p(t, x: ; 2)p(t, xe ; 1) /p(t, % ; 
41) p(t, x2 ; 2)} is computed. If for any t = T d(T’) S Bwe accept H2 . If d(T) = A we accept 
H, . If neither holds we continue to observe. For those b.s.p. which admit a sufficient sta- 
tistic for the parameter w, a new decision function, depending on the realizations alone, can 
be defined. In these cases the decision boundaries are functions of time and the parameter 
values 4; and 4, . The test is applied to the birth, death, birth-and-death, and Pélya proc- 
esses. Possible applications of this procedure to some problems in biology, physics, soci- 
ology, and telephone and industrial engineering are discussed. A detailed treatment of a 
problem concerned with the comparison of two stochastic epidemics is given. 





NEWS AND NOTICES 
Readers are invited to submit to the Secretary of the Institute news items of interest 


Personal Items 


Helen Bozivich, Research Associate in the Statistical Laboratory of Iowa 
State College, has accepted a position at Purdue University as Assistant Pro- 
fessor in its Statistical Laboratory beginning July 1, 1955. She was awarded 
the degree of Doctor of Philosophy in statistics by Iowa State College in June, 
based partly on a dissertation, “Power of test procedures for certain incom- 
pletely specified random and mixed models.” 

Edward C. Bryant received a Ph.D. degree in statistics in June 1955 from the 
Department of Statistics, lowa State College, his thesis being entitled “An 
analysis of some two-way stratifications.” Dr. Bryant is Associate Professor and 
Chairman of the Department of Statistics at the University of Wyoming. 

D. R. Cox, of the Statistical Laboratory, University of Cambridge, spent 
July and August, 1955, as a Research Associate in the Section of Mathematical 
Statistics, Princeton University, and then took the post of Visiting Professor in 
the Department of Biostatistics, School of Public Health and in the Institute 
of Statistics, University of North Carolina, Chapel Hill. 

Dr. H. A. David has resigned from his position with the Commonwealth 
Scientific and Industrial Research Organization of Australia to take up a Senior 
Lectureship in Statistics in the University of Melbourne. 

Professor Wilfrid J. Dixon has accepted a position as Professor of Biostatistics 
at the Medical School, University of California at Los Angeles. 

Upon completion of basic training at Camp Chaffee, Arkansas, Charles E. 
Gates was assigned as “Statistics Assistant” to Board No. 2, 8576DU, GONARC, 
Fort Knox, Kentucky. 

Irwin Guttman has received his Ph.D. from the University of Toronto, and 
has been appointed as an Assistant Professor of Statistics on the staff of the 
University of Alberta. 

W. J. Hall has received a commission in the United States Public Health Serv- 
ice and is stationed at present at the Communicable Disease Center in Atlanta, 
Georgia. 

John F. Hofmann, Chief Statistician of the Naval Ordnance Laboratory at 
Corona, California, was granted a Ph.D. degree in statistics by lowa State Col- 
lege in June, 1955. His dissertation concerned ‘‘Life testing in controlled environ- 
mental conditions.” 

William Kruskal (Committee on Statistics, University of Chicago) is a visiting 
faculty member at the Department of Statistics, University of California, 
Berkeley, during the 1955-1956 academic year. 

William G. Madow is a Visiting Professor of Statistics at the Department of 
Statistics, Stanford University, Stanford, California during the academic year 
1955-1956. 
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Richard B. McHugh has been promoted to Associate Professor at Iowa State 
College effective July 1, 1955. He holds a joint appointment in the Department 
of Psychology and the Department of Statistics. 

Professor J. E. Morton has resigned from Cornell University and has accepted 
a position as Chief Statistician and Head, Office of Special Studies with the 
National Science Foundation in Washington, D. C. 

Dr. D. V. Newton, formerly with the Applied Physics Laboratory, University 
of Washington, has joined the Applied Science Division of the International 
Business Machine Corporation. 

Gottfried E. Noether of Boston University has been promoted to Associate 
Professor. 

Dr. George W. Petrie was recently promoted to the position of Manager of the 
Programming Research and Development Department on the Electronic Data 
Processing Machines Division of IBM. The department is located with the Re- 
search Laboratory at Poughkeepsie, New York. 

Dr. Benjamin J. Tepping has resigned his position as Mathematical Statis- 
tician in the Bureau of the Census to join the staff of National Analysts, Inc., 
Philadelphia, Pennsylvania where he will work on operations research problems. 

Dr. W. R. Van Voorhis, Professor of Mathematics at Fenn College, has been 
appointed Chairman of the Department. 

Eugene F. Witeck, formerly with Boeing Airplane Company, Seattle, Wash- 
ington, is now a Quality Control Engineer with the Sandia Corporation, Albu- 
querque, New Mexico. 

Wassily Hoeffding is a Visiting Associate Professor of Mathematical Sta- 
tistics in Columbia University, New York, N. Y., during the Winter Session, 
1955-56. 

Martin B. Wilk, appointed Assistant Professor in the Statistical Laboratory 
at lowa State College last April, is on leave for one year beginning July 1, 1955, 
for basic research at Princeton University under an Office of Naval Research 
contract in the statistics section of the Department of Mathematics. Prof. 
Wilk completed requirements for a Ph.D. degree in statistics at Iowa State 
College in June 1955, with a thesis on ‘Linear models and randomized experi- 
ments.” 


Preliminary Actuarial Examinations Prize Awards 


The winners of the prize awards offered by the Society of Actuaries to the nine 
undergraduates ranking highest on the score of Part 2 of the 1955 Preliminary 
Actuarial Examination are as follows: 


First Prize $200 
Rape, DOV .........- Rensselaer Polytechnic Institute 
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Additional Prizes of $100 


CM MG sho. spedcucescesess Yale University 

Horn, William A University of Cincinnati 
McCullough, Roger S..... University of Toronto 
Mitchell, Barry M University of Toronto 
Moskowitz, Lester Brooklyn College 

ere i, o. ce causa pa ecee University of Michigan 
Sprung, Donald W. L. University of Toronto 
CIS Mien, a's Gis v goth doped ss University of Toronto 

The Society of Actuaries has authorized a similar set of nine prizes for the 
1956 examinations on Part 2. 

The Preliminary Actuarial Examinations consist of the following three exami- 
nations: 

Part 1. Language Aptitude Examination. (Reading comprehension, meaning 

of words and word relationships, antonyms, and verbal reasoning.) 

Part 2. General Mathematics Examination. (Algebra, trigonometry coordinate 

geometry, differential and integral calculus.) 

Part 3. Special Mathematics Examination. (Finite differences, probability 

and statistics.) 

The 1956 Preliminary Actuarial Examinations will be prepared by the Edu- 
cational Testing Service and will be administered by the Society of Actuaries 
at centers throughout the United States and Canada on May 9, 1956 (tentative 
date). The closing date for application is March 15, 1956. 

Detailed information concerning the Examinations can be obtained from: 

The Society of Actuaries 
208 South LaSalle Street 
Chicago 4, Illinois 


New Members 


The following persons have been elected to membership in the Institute 
May 12, 1955 to August 5, 1955 


Addelman, Sidney, B.A. (Carleton College), Graduate Teaching Assistant, University of 
Delaware, Newark, Delaware. 

Ahmed, M. Salahuddin, M.A. (Osmania Univ.), Teaching Assistant, Statistical Labora- 
tory, University of California, Berkeley 4, California. 

Cyert, Richard M., Ph.D. (Columbia Univ.), Associate Professor of Economics, Carnegie 
Institute of Technology, Pittsburgh 13, Pennsylvania. 

Ellison, Bob E., Student, University of Chicago, 5801 Ellis Avenue, Chicago 37, Illinois. 

Estes, William K., Ph.D. (Univ. of Minn.), Professor of Psychology, Indiana University, 
Bloomington, Indiana. 

Friedman, Lawrence, M.A. (Chicago Univ.), Research Assistant, Operations Research, 
Case Institute of Technology, Cleveland, Ohio, 2724 Mayfield, Cleveland Heights, Ohio. 
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Gammon, Edward R., M.A. (Univ. of Oregon), Mathematician, Northrop Aircraft, Haw- 
thorne, California, 2821 Bayview Drive, Manhattan Beach, California. 

Gautschi, Werner, Ph.D. (Univ. of Basel), Research Fellow, Statistical Laboratory, De- 
partment of Mathematics, University of California, Berkeley, California. 

George, Aleyamma (Miss), M.A. (Univ. of Travancore), Statistical Laboratory, Univer- 
sity of Travancore, Trivandrum, Travancore. 

Gepfert, Alan H., M.S. (Case Inst. of Tech.), Operations Research Analysis, Research As- 
sistant, Project Doan Brook, Case Institute of Technology, University Circle, Cleve- 
land 6, Ohio, 3525 Washington Boulevard, Cleveland Heights 18, Ohio. 

Hesse, Herbert, M.S. (Univ. of Chicago), Student, Committee on Statistics, University of 
Chicago, Chicago 37, Illinois, 444 West 98th Place, Chicago 28, Illinois. 

Hinkle, M. L., Jr., M.S. (Purdue Univ.), Student, Purdue University, Lafayette, Indiana, 
603 West 4th Street, Marion, Indiana. 

Katz, Melvin, Jr., B.S. (Calif. Inst. of Tech.), Graduate Student, University of California, 
Statistical Laboratory, Berkeley 4, California. 

King, Kenneth R., M.A. (Univ. of Colorado), Graduate Student, Purdue University, West 
Lafayette, Indiana, 812 Vine Street, West Lafayette, Indiana. 

Lee, I. M., Ph.D. (Iowa State College), Associate Professor of Agricultural Economics, 
University of California, Berkeley 4, California, 207 Giannini Hall, University of Cal- 
ifornia, Berkeley 4, California. 

Mitra, Sujit Kumar, M. Sc. (Calcutta), Research Assistant, Department of Statistics, Uni- 
versity of North Carolina, Phillips Hall, Chapel Hill, North Carolina. 

Naddor, Eliezer, M.S. (Columbia Univ.), Research Associate, Operations Research Group, 
Case Institute of Technology, 10900 Euclid Avenue, Cleveland 6, Ohio. 

Nag, S. K., M.Sc. (Allahabad Univ.), Research Officer, Bihar Institute of Hydraulic and 
Allied Research, Mamal Road, Patna, India. 

Pai, Mangalore Vasudeva, M.A. (Madras Univ.), Research Student, Statistical Laboratory, 
Engineering Administration Building, Purdue University, West Lafayette, Indiana. 

Robson, Douglas S., Ph.D. (Cornell Univ.), Assistant Professor, Biometrics Unit, Cornell 
University, Ithaca, New York. 

Sherman, Gordon R., M.S. (Stanford Univ.), Research Assistant, Statistical Laboratory, 
Purdue University, West Lafayette, Indiana. 

Shroyer, Robert J., B.S. (Univ. of Dayton), Graduate Student, Statistical Laboratory, 
Purdue University, Lafayette, Indiana. 

Siddiqui, M. M., M.A. (American Univ.), Senior Lecturer, Panjab University, Department 
of Statistics, Panjab University, Lahore, Pakistan, Institute of Statistics, University 
of North Carolina, Chapel Hill, North Carolina. 

Sterling, Theodor D., Ph.D., (Tulane Univ.), Assistant Professor, University of Alabama, 
University, Alabama, P.O. Box 3187, University, Alabama. 

Storm, Leo E., B.A. (Oklahoma A. and M. College), Mathematician, Aerial Measurements 
Laboratory, Northwestern University, 2422 West Oakton Street, Evanston, Illinois, 
6720 South Jeffrey Boulevard, Chicago 49, Illinois. 

Sudman, Seymour, B.S. (Roosevelt College), Analytical Statistician, Market Research 
Corporation of America, 425 North Michigan Avenue, Chicago 11, Illinois, 2633 East 
74th Street, Chicago 49, Illinois. 

Thomas, Alan T., M.S. (Univ. of Louisville), Research Engineer, Brown-Forman Distillers 
Corporation, P.O. Box 1080, Louisville, Kentucky, R.R. No. 8, Box 26, Louisville, 
Kentucky. 

Walpole, R. E., M.A. (McMaster Univ.), Student, Virginia Polytechnic Institute, Blacks- 
burg, Virginia. 

Yaspan, Arthur J., M.S. (Univ. of Chicago), Instructor, Department of Mathematics, 
Western Reserve University, Cleveland, Ohio, 1563 Knuth Avenue, Euclid, Ohio. 





REPORT OF THE BERKELEY MEETING OF THE INSTITUTE OF 
MATHEMATICAL STATISTICS 


The sixty-sixth meeting, a Western Regional meeting, of the Institute of 
Mathematical Statistics was held at the University of California, Berkeley, 
California, on July 14-16, 1955. The meeting was held in conjunction with the 
Third Berkeley Symposium on Mathematical Statistics and Probability. There 
was a joint dinner with the Symposium on Thursday night and a beer party on 
Friday night. 

The following 77 members of the Institute attended: 


G. A. Baker, M. 8. Bartlett, G. E. Bates, C. B. Bell, J. J. Birch, A. Birnbaum, Z. W. 
Birnbaum, D. Blackwell, C. Boll, A. H..Bowker, R. N. Bradt, J. V. Breakwell, C. L. Chiang, 
K. L. Chung, K. G. Clemans, A. Court, E. L. Crow, R. C. Davis, J. L. Doob, R. Dorfman, 
A. J. Duncan, M. Dwass, J. R. Eklind, M. Elveback, B. Epstein, E. A. Fay, T. S. Ferguson, 
R. Fortet, E. Fix, M. Fox, W. Gautschi, E. J. Gilbert, G. H. Golub, G. Gregory, T. E. 
Harris, J. L. Hodges, Jr., W. Hoeffding, W. W. Hoy, J. P. Imhof, Kiyso Ito, Koichi Ito, 
S. Karlin, W. M. Kincaid, H.S. Konijn, C. H. Kraft, M. Krakowski, W. Kruskal, L. LeCam, 
E. L. Lehmann, G. J. Lieberman, D. V. Lindley, W. G. Madow, A. M. Mood, J. Neyman, 
D. B. Owen, J. Pachares, R. R. Read, G. J. Resnikoff, M. Rosenblatt, A. R. Roy, H. Rubin, 
J.S. Rustagi, M. Sandomire, I. R. Savage, H. Scheffé, L. Schwartz, E. L. Scott, A. Shapiro, 
G. P. Steck, C. Stein, M. D. Stein, W. F. Taylor, D. R. Truax, H. G. Tucker, J. Walsh, 
A. Wiggins, O. Wesler. 


The program of the meeting was as follows: 


THURSDAY, JULY 14, 1955 
9:30-10:30 a.m. Invited Address 
Chairman: Josepu L. Doos, University of Illinois 


(1) A Gaussian Random Function of a Point of Hilbert Space, Paut Livy, L’Ecole Poly- 
technique. 


10:45-11:45 a.m. Recent Results in Large Sample Theory 


Chairman: ALLAN BirnBaum, Columbia University and Stanford University. 
(1) Concepts of Consistency in Relation to Estimates Obtainable as Solutions of Certain Equa- 
tions, CHARLES Krart and Lucten LeCam, University of California. 


(2) Distribution of x* for Random Classes, A. R. Roy, Department of Agriculture, India, 
and Stanford University. 


1:30-3:30 p.m. Invited Addresses 
Chairman: Henry Scuerr®&, University of California 


(1) Stochastic Processes, Statistical Inference and the Specification Problem, M.8. BARTLETT, 
University of Manchester. 
(2) Statistical Applications of General Random Elements, Ropert Forter, Institut Henri 
Poincaré. 
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3:45-5:00 p.m. Contributed Papers, I 
Chairman: Georce A. Baker, University of California at Davis 


(1) Nonparametric Mean Estimation of Percentaye Points and Density Function Values, 
Joun E. Wausu, Lockheed Aircraft. 


(2) On the Concept of Probability in Quantum Mechanics, A. O. Barut, Stanford, Sponsored 
by M. Loéve. 


Two-Sample Estimates of Prescribed Precision (Preliminary Report), ALLAN BrRNBAUM, 
Columbia University and Stanford University. 

Minimaz Test for the Parameter of a Poisson Process, J. V. BREAKWELL, North American 
Aviation, Inc. 

Distribution of a Definite Quadratic Form in Non-Central Normal Variates, JamEs 
Pacuares, Hughes Aircraft Company. 

Errors in Normal Approximations to Cerlain Types of Distribution Functions (By Title), 
J.T. Cuvu, University of North Carolina. 

(7) Estimation of the Parameters of the One- and Two-Parameter Single Exponential Dis- 
tributions from Singly and Doubly Censored Samples (By Title), A. E. Saran and 
B. G. GREENBERG, University of North Carolina. 

The Joint Distribution of Serial Correlation Coefficients (By Title), G. S. Watson, 
Australian National University. 


FRIDAY, JULY 15, 1955 
11:30 a.m.—1:30 p.m. Invited Addresses 


Chairman: A. M. Moon, General Analysis Corporation. 


(1) A New Monte Carlo Technique Depending on the Use of Negative Correlations, J. M. 
HaMMERSLEY, Oxford University. 

(2) Comparison of Experiments and Information Theory, D. V. Linptey, University of 
Cambridge and Stanford University. 


2:00-3:00 p.m. Invited Address 
Chairman: Evetyn Frx, University of California. 


(1) Fluctuation in the Transparency of Photographic Films A. J. L. Buanc-LAapiERRE, Uni- 
versité d’Alger. 


3:15-5:00 p.m. Recent Results in Decision Theory 


Chairman: W1Lut1am G. Mapow, University of Illinois. 


(1) Stochastic Decision Processes and Dynamic Programming, R. E. Bettman, The RAND 
Corporation. 


(2) A Modified Minimax Principle, Oscar WeEsLER, Stanford University. 
(3) Further Results on Decision Theory for Polya Type Distributions, SamugeL Karin, 
California Institute of Technology and Stanford University. 


SATURDAY, JULY 16, 1955 
9:30-10:30 a.m. Invited Address 


Chairman: Davip BLackwELL, University of California. 


(1) On the Use of the U-Statistic, Z. W. Birnspaum, University of Washington. 
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11:00-12:00 a.m. Contributed Papers, II 


Chairman: ALBert H. Bowker, Stanford University. 





(1) On Tests of Certain Hypotheses Invariant Under the Full Linear Group (Preliminary 
Report), Cuarues M. Srerm, Stanford University. 

(2) Asymptotic Formulae for the Distribution of Hotelling’s Generalized To Statistic, Ko1cut 
Iro, University of North Carolina. 

(3) Asymptotic Distribution of Maximum Likelihood Estimates in Factor Analysis in the 
Loading-N ormalized Case, HerMAN Rustin, Stanford University. 

(4) A Run Test of the Hypothesis that the Median of a Stochastic Process is Constant, T. S. 
Fercuson and Cuarves H. Krart, University of California. 

(5) Analysis of Dispersion ona Sphere (By Title), G.S. Watson, The Australian National 
University. 

(6) Convergence in Distribution and Fourier-Stieltjes Transforms of Random Functions (By 
Title, Preliminary Report), Emanuevt Parzen, Columbia University. 

(7) On the Statistical Analysis of Markov Chains (By Title, Preliminary Report), Leo A. 
GoopmaNn, University of Chicago. 

(8) Convergence Properties of Elements of a Certain Class of Stochastic Approximation Proc- 
esses, Continued (By Title), Donatp L. BurKHOLDmR, University of North Carolina. 


CHARLES H. Krart 
Assistant Secretary 


REPORT OF THE ANN ARBOR MEETING OF THE INSTITUTE OF 
MATHEMATICAL STATISTICS 


The sixty-seventh meeting, and seventeenth summer meeting, of the Institute 
of Mathematical Statistics was held at the University of Michigan, Ann Arbor, 
Michigan, on August 30-September 2, 1955. The meeting was in conjunction 
with the summer meetings of the American Mathematical Society, the Mathe- 
matical Association of America, the Association for Symbolic Logic, the Econo- 
metric Society, and the Society for Industrial and Applied Mathematics. A 
Special Invited Paper entitled Statistical Problems in Genetics was presented by 
Howard Levene, Columbia University; and a Special Invited Paper entitled 
Alternate Models for the Analysis of Variance was presented by Henry Scheffé, 
University of California. 

The following 125 members of the Institute attended: 


Om P. Aggarwal, G. E. Albert, W. R. Allen, C. B. Allendoerfer, R. L. Anderson, T. W. 
Anderson, K. J. Arnold, K. J. Arrow, J. L. Bagg, R. E. Beckwith, T. G. Birdsall, David 
Blackwell, C. R. Blyth, Herman Chernoff, A. G. Clark, A. B. Clarke, T. F. Cope, A. H. 
Copeland, Sr., L. J. Cote, L. M. Court, C. C. Craig, J. H. Curtiss, D. A. Darling, H. T. 
David, E. L. Diamond, J. L. Doob, David Durand, P. 8. Dwyer, Marjorie J. Easterbrook, 
Churchill Eisenhart, H. P. Evans, C. H. Fischer, J. Sutherland Frame, Lawrence Fried- 
man, T. C. Fry, H. M. Gehman, J. W. Gilmore, W. A. Golomski, J. A. Greenwood, T. N. E. 
Greville, Paul Gunther, J. W. Hamblen, P. C. Hammer, J. F. Hannan, H. L. Harter, H. O. 
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Hartley, Clifford Hildreth, R. V. Hogg, W. C. Hood, A. 8. Householder, Paul Irick, Walter 
Jacobs, E. H. Jebe, Mark Kac, Samuel Karlin, Leo Katz, M. B. Keats, J. H. B. Kemperman, 
W. M. Kineaid, Leslie Kish, L. A. Knowler, T. C. Koopmans, 8S. Kullback, D. E. Lamphi- 
ear, Marguerite Lehr, R. A. Leibler, F. C. Leone, Howard Levene, M. M. Lightstone, S. P. 
Lloyd, F. W. Lott, Mrs. Mary D. Lum, G. F. Lunger, Brockway McMillan, W. G. Madow, 
J. W. Mauchly, Paul Meier, D. F. Mela, D. M. Mesner, Robert Mirsky, Norman Morse, 
Frederick Mosteller, 8S. C. Moy, Mervin Muller, A. C. Nelson, Jr., C. J. Nesbitt, G. E. 
Nicholson, Jr., G. E. Noether, J. 1. Northam, R. E. Odeh, E. G. Olds, Ingram O}kin, T. M. 
Oneson, Bernard Ostle, Toby Oxtoby, Emanuel Parzen, R. 8S. Pinkham, G. B. Price, Henry 
Quastler, Stanley Reiter, G. J. Resnikoff, P. R. Rider, Selby Robinson, Albert C. Rohloff, 
David Rosenblatt, Herman Rubin, R. W. Rutledge, Jerome Sacks, Esther B. Schaeffer, 
Henry Scheffé, Herbert Solomon, P. N. Somerville, F. A. Sorensen, Rothwell Stephens, 
T. Sterling, A. D. Stewart, Zenon Szatrowski, R. M. Thrall, C. K. Tsao, A. W. Tucker, 
J. W. Tukey, J. B. Tysver, 8. 8. Wilks, J. Wolfowitz, Max Woodbury. 


The program of the meeting was as follows: 
TUESDAY, AUGUST 30, 1955 
9:00 a.m. Information Theory 


Chairman: Henry Quastler, University of Illinois 
Papers: 1. Probability and Statistics in Communications, Brockway McMillan, Bell 
Telephone Laboratories. 
2. The Coding Problem in Information Theory, David Slepian, Bell Telephone 
Laboratories. 
3. The Distribution of Sample Information Functions, Herbert T. David, Uni- 
versity of Chicago. 
Discussion: J. L. Doob, University of Illinois 
W. G. Madow, University of Illinois 


1:30 p.m. The Use of Automatic Computers in Solving Statistical Problems 


Chairman: Paul R. Rider, Wright Air Development Center 
Paper: Standard Programs for the Varied Problems in Statistical Analysis, H. O. Harley, 
Iowa State College. 


3:30 p.m. Contributed Papers I 


Chairman: Louis Cote, Purdue University 
Papers: 1. Distribution of the m-th Range, J. Arthur Greenwood, Manhattan Life Insur- 
ance Company. 

2. Approximation to the Distribution of the Sum of Cosines of Random Angles 
(Preliminary Report), David Durand, Massachusetts Institute of Tech- 
nology, and J. Arthur Greenwood, Manhattan Life Insurance Company. 

3. Crystalline Connectivity, J. M. Hammersley, University of Oxford, (intro- 
duced by W. H. Kruskal, University of California). 

4. Distributions of Roots of Algebraic Equations with Variable Coefficients, John 
W. Hamblen, Oklahoma Agricultural and Mechanical College. 

5. On a Modified Sum of Poisson Processes, A. Bruce Clarke, University of 
Michigan. 


7:30 p.m. Council meeting. Institute of Mathematical Statistics 
Presiding: Henry Scheffé, University of California 
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WEDNESDAY, AUGUST 31, 1955 


9:00 a.m. Decisions under Uncertainty (Co-sponsored with the Econometric 
Society) 


Chairman: J. Wolfowitz, Cornell University 
Papers: 1. Foundations of a Stochastic Theory of Organizations, David Rosenblatt, 
American University. 
2. The Use of Probability Forecasts in Making a Sequence of Decisions under a 
Quadratic Criterion, Herbert Simon, Carnegie Institute of Technology. 
Discussion: Stanley Reiter, Purdue University, Max Woodbury, George Washington 
University 


11:00 a.m. Special Invited Paper 


Chairman: Leo Katz, Michigan State University 
Paper: Statistical Problems in Genetics, Howard Levene, Columbia University 


1:30 p.m. Contributed Papers I 


Chairman: Donald Darling, University of Michigan 
Papers: 1. A Method of Constructing Nonparametric Multivariate Tests (Preliminary 
Report), T. W. Anderson, Columbia University. 
2. Confidence Intervals for a Measure of Effectiveness (Preliminary Report), 
x0ttfried E. Noether, Boston University. 
. Truncated Binomial and Negative Binomial Distributions, Paul R. Rider, 
Wright-Patterson Air Force Base. 
. Error Rates and Sample Sizes for Multiple Range Tests, H. Leon Harter, 
Wright-Patterson Air Force Base. 
. On Transient Markov Chains with a Countable Number of States, David 
Blackwell, University of California. 
. A z-transformation and a t-statistic for Serial Correlation Coefficients, John 
S. White, University of Manitoba, (By title). 
. Distance Tests with Good Power for the Nonparametric k-sample Problem, J. 
Kiefer, Cornell University, (By title). 
. Comparison of Populations Whose Growth Can Be Described by a Branching 
Stochastic Process, A. T. Bharucha-Reid, Columbia University, (By title). 


THURSDAY, SEPTEMBER 1, 1955 
9:00 a.m. Business meeting, Institute of Mathematical Statistics 
Presiding: Henry Scheffé, University of California 
4:00 p.m. Special Invited Paper 


Chairman: C. C. Craig, University of Michigan 
Alternate Models for the Analysis of Variance, Henry Scheffé, University of 
California 


FRIDAY, SEPTEMBER 2, 1955 
9:00 a.m. Recent Advances in Combinatorial Analysis. 


Chairman: H. B. Mann, Ohio State University 
Papers: 1. Numerical Analysis of Combinatorial Designs, Marshall Hall, Jr., Ohio State 
University. 
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2. The Solution of Some Computational Problems in Combinatorial Analysis by 


High Speed Computing Machines, Morris Newman, National Bureau of 
Standards. 


3. Systems of Distinct Representatives and Other Combinatorial Applications of 
Linear Programming, A. J. Hoffman, Office of Naval Research and National 
Bureau of Standards, and H. W. Kuhn, Bryn Mawr College and Princeton 
University. 

4. Combinatorial Problems of Experimental Design, R. C. Bose, University of 
North Carolina. 


I thank Paul S. Dwyer, University of Michigan, for providing the above 
information. 


WILLIAM KRUSKAL 
Associate Secretary 
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Censo de la Poblacion de Espafia (1950), Vol. II, Instituto Nacional de Esta- 
distica, 1954, 493 pp. 

Statistical Theory of Extreme Values and Some Practical Applications, by E. J. 
Gumbel, National Bureau of Standards Applied Mathematics Series 33, 
51 pp., $.40. (Order from Government Printing Office, Washington 25, D. C.) 
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